a rR eee 
+h SUP a ayer 
menpae FF 0, 
eS ee 


hit 
pe ets » Tess 
pee 


* { 
notes rl 


tt5A 


se 
= 


3 


ae 
oo 


3 ee 
Ge 
apie 
yee 


oa 


i 
Pry) 


Tx 


rs oe rere 


sim 


Salen ti 


mesa Ai ees 


4 

6 
eg 
; 
% 
z 


Sey & 


wy ig yay Ve 


2S: 


LAS 


{Pie ea ton ag 


ret ep det 
¥ Tate 


he 
Seth 
Bt Late 
ved 4 ’ 
Coser ieee 
7m be 


raeey ares 
a anny) hedge cee 
, 
beh byte ag a ef 
Bey 
. y ee * 
Re ya) ei aes te fais. 
Bans eck ain 
C3 “ee. 
iietioos Pair 
re Waa cee 
Setbebatt on 0 


> : 

werta bine vir 

anes 
eee, 4 


3 0g 
br uae 
ie bier tn it A Mot 2 ston 


CONTENTS 


PREFACE | 
| ‘CHAPTER 1 


THE NATURE OF RESEARCH IN EDUCATION AND 
- PSYCHOLOGY 


Human KNow.LepDGE AND RESEARCH 
RESEARCH IN EDUCATION 

AREAS OF EDUCATIONAL RESEARCH 
RESEARCH IN PSYCHOLOGY 


CHAPTER 2 


THE HISTORICAL AND PHILOSOPHICAL TYPES OF © 
RESEARCH IN EDUCATION 


A. Tue Historical RESEARCH 
The differentia of historical research; The method: Theme; 
Collection of data; Criticism and classification of data E 
Interpretation; The research report, 


B. Tue Pxriosopricas RESEARCH IN EDUCATION _ 
Philosophy and education; The research task in philosophy; 
The theme and treatment. 


CHAPTER 3 


THE STATISTICAL METHODS OF RESEARCH IN 
EDUCATION AND PSYCHOLOGY: STATISTICAL, 
ANALYSIS AND SCALING 


THE SCIENTIFIC APPROACH 

I Types or Measures 

IT ELementary Statistica, Concrpts 
Central tendency; Dispersion; Polygon and curve ; Skew- 
ness and kurtosis; Correlation: Percentile: Normal proba- 
bility distribution; Standard measure; Types of correla- 
tion; Regression weights; Sampling distributions ; Standard 
error; Nonparametric statistics. 


ix 


Vil 


C8. O Om 


18 


Ja 


38 
39 
43 


IIT 


IV 


VII 


STATISTICAL METHODS OF RESEARCH IN 
EDUCATION AND PSYCHOLOGY: TESTING . 


VIll 


IX 


CONTENTS 


SAMPLING 
Sampling methods; The probability methods; The non- 
probability methods. 


TEsTING OF HyPorTHEsis: ¢ and ¥? 
Postulate, assumption and hypothesis; The Null hypothesis; 
The ¢; The Chi-square. 


ANALYSIS OF VARIANCE 


PsyCHOPHYSICS AND SCALING 
Classical Psychophysics; Modern psychophysics; Scaling. 


SCALING OF ATTITUDE 

The Thurstone method; The Likert scale, Remmer’s 
method; Guttman’s scale analysis; Coomb’s unfolding 
technique; Lazarsfeld’s latent structure. 


CHAPTER 4 


AND FACTOR ANALYSIS 


MENTAL TESTING As RESEARCH 


Research in psychometrics; Areas of testing; Types of tests; 
Test construction: 

*‘Meaningful scores” 

Reliability 

Error of measurement 

Validity: 

Factors affecting validity 

Norms 

Score transformations 


Factor ANALYsIs AS A METHOD OF RESEARCH 
What is factor analysis? 

The Spearman type of factors 
Thurstone’s general factor theorem 

The Problem of rotation of axes 

Reversal of perspective in factor analysis 


48 


oo 


99 


68 


78 


86 


109 


THE STATISTICAL METHODS OF RESEARCH IN 
EDUCATION AND PSYCHOLOGY: PREDICTION 


CONTENTS 
CHAPTER 5 


AND DECISION PROCESSES 


X MULTIVARIATE DEPENDENCIES AND PREDICTION 


XI 


The nature of variables; Prediction of attribute from 
attribute; Prediction of quantity from attribute; Predic- 
tion of attribute from quantity; Discriminant function : 
Prediction of continuous variates from continuous variates; 
Regression; Prediction of a quantitative variate from a set of 
variates; Inclusion of tests in the battery; Prediction of a 
number of criteria; Absolute and differential weighting; 
Other types of weights; Classification; The curvilinear 
relationship. 


Utmiry Meruops In Sratistics 

The practicable versus the ideal; Power functions and 
operating characteristic curves; Sequential sampling; Non- 
parametric statistics. 


CHAPTER 6 


THE ROLE OF EXPERIMENTAL AND CLINICAL 


A. 


ha: 
E 


APPENDIX 
REFERENCES * 


INDEX 


METHODS IN RESEARCH 


THE EXPERIMENTAL METHOD 
1. The Psychological Experiment 
2. The “Field”? Experiment 

3. Méill’s “Canons of Experimental Enquiry’? 

4. Experimental Designs 

THE Cuinicat AppRoacu In RESEARCH 

1. The Nature of Clinical Work 

2. The Case-study Method. 

METHODs OF COLLECTION oF DaTA 

Content analysis; Observation ; Questionnaire, Check-list, 
schedule, inventory; Interview; Test and Scale. 
SMALL SAMPLE Statistics 

THE RESEARCH REPORT 

Introduction; Data; Analysis of Data; Results and inter- 
pretation; Conclusion and summary... 


e— MO sw 


xi 


132 


153 


164 


174 


179 


186 
188 


195 
197 
203 


TRUTH EMERGES MORE EASILY FROM ERROR THAN FROM CONFUSION 


—Francis Bacon 
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SHES NATURE OF RESEARCH IN 
EDUCATION. AND. PSYCHOLOGY 


HUMAN KNOWLEDGE AND RESEARCH 


Human knowledge works at two levels. At the primary level it acts _ 


as the“basis_of a useful human activity, as when a medical man 


uses his knowledge to heal patients or as when a teacher expounds 
his subject for the benefit of his students. At the secondary level 
knowledge is employed to produce increments in the existing know- 
ledge. At this level knowledge is not merely utilitarian ; it is not 
merely utilised or spread but its quantum is increased by howsoever 
small an amount. The activity which produces such quanta of 
knowledge is known as research. Research is obviously not a com- 
pact way of spelling ‘re-search’, for it is not merely a search repea- 
ted. Research is an intellectual activity which brings to light new 
knowledge or corrects previous errors and misconceptions and adds 
in an orderly way to the existing corpus of knowledge. Howsoever 
brilliant and original a restatement may be it can never be dignified 
by the name of research. Knowledge differs from experience in 
being orderly, systematic and insightful. In experience we are 
merely ‘exposed’ to certain events or observations. They occur in 
their own haphazard manner. In research such observations are 
ordered and analysed to answer certain crucial questions, and these 
answers, carefully stated, are the increments of knowledge we are 
after. Such knowledge implies a much deeper and fuiler understan- 
ding of the true nature of phenomena. Knowledge gained by 
being derived from a Latin root which means ‘to know’. Such know- 
ledge is scientific in the sense of being objective, impersonal and 
of general validity. A state of beatitude or mystic communion may 
give new experience amounting to personal knowledge but unless it 
can be verified under s ecified conditions it cannot become the 
We oF reacts ee coct oes Personal pursuit of the 
Godhead, the secret of death and the mystery of life cannot result 


in research. ‘The knowledge which accrues from research is a 
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matter of rational understanding, common _ verification and 
experience and is free from the researcher’s personal subjectivity. 

Karl Pearson (1937) said that all science in the end seeks to 
describe phenomena or the facts of nature. This description 
when most general and condensed becomes a formula. Scientific 
knowledge is a series of condensed and compactly expressed gene- 
ralizations about facts of nature. The mere description of 
elementary, isolated facts of nature is not science. That kind of 
description even a novelist would attempt and in very effective 
language too. Such isolated iota of facts, organized and classified, 
are available in a railway time table and telephone directory also; 
but they are not scientific knowledge. Science is not concerned 
with such facts per se but interests itself in facts about facts of nature ; 
we are here, so to say, at second remove from the concrete facts 
as such. Historical research is an exception to this rule but none 
the less demands an objective evaluation of evidence by rational 
means. Largely, therefore, research does not report facts but 


seeks to describe their_generality and relationship in time and 
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space. ‘The relations of facts, their bearing on each other, their 


mutual interaction is the secondary and derived body of facts 
which the scientist tries to describe. Research implies an aug- 
mentation of this order of facts. The discovery of a mere concrete 
fact by itself cannot be research. To rise to this level the discovery 
will have to be related to other known facts. It is the study of 
these complex relationships that constitutes research activity. 
To see things old or new in the web of their relational suspension 
is the goal of research. 


$V All research is an advance on existing frontiers of knowledge. 


This advance assimilates the existing knowledge to an extent 
that it is capable of revising it. Research does not mean meek 


equally reopen settled issues and offer new solutions. Research 
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is thus no respecter of the past. As more and more is known 
the need for revision of old knowledge arises and this too is a 
legitimate function of research. From what has been said so 
ar it is clear that knowle ge is a matter of continuity. It isa 
connected fabric of facts about phenomena which cannot be seen 
in mere isolation. There are no gaps in the advance of scientific 
knowledge. Research does not arise in a desert of ignorance; 
it represents, instead, the expansion of the periphery of existing 
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knowledge. It must necessarily take us beyond the frontiers of 
present knowledge. This advance need not always be horizontal. 
A vertical type of progress would improve our existing knowledge 
by the simple action of subsuming many earlier laws and general- 
izations under a yet wider principle. This discovery of a higher, 
more comprehensive principle is as much research as the discovery 
of a new principle. It is the characteristic of research that it seeks 
by preference principles or facts of universal validity. Both, 
breaking fresh ground and improving existing knowledge, are 
the proper functions of research and imply the continuity of an 
intellectual activity aimed at introducing a sense of design into 
the observable universe of our experience. 

It has been earlier said that research adds to existing knowledge 
in an orderly way. This orderliness is to be particularly emphasized, 
for helter- r presentation of odds and ew knowledge, 
howsoever valid and accurate, is not the form of proper research. 
Research has the organic unity, sense of design and completeness 
of a work of art or the successful exploit of a task force. Mere 
aimless and confused groping for new knowledge may add to 
knowledge in a desultory and perfunctory way but such activity 
is too dilettante and haphazard to be called research. Even a 
consideration of size is not wholly irrelevant, although size is, 
to a large extent, dependent on the cost in terms of time and money. 
Even so, unless a sufficient quantum of new or more advanced 
knowledge which is complete in_ itself is shown, an enquiry 
cannot rise to the status of acceptable research. 

The knowledge which accrues from research is verified and is 
verifiable by anybody who takes the trouble todo so. The process 
by which it has been derived is ‘replicable’ i.e. it can repeated 
and the stated results confirmed. Alternatively as in historical 
research evidence presented can be checked in original sources. 
Nothing is put forward on the mere prophet-like authority of some 
one else; every truth is made to speak for itself. In this sense too 
knowledge resulting from research is objective and capable of ‘third 
party’ verification, so that it is not just one man’s word against 
another, but the matter can be independently judged by a third 
person also. To summarize, resea is the typical process by 


which scientific knowledge is advanced in an orderly manner by 
unitary quanta of sufficient size on the basis of previous knowledge 


and necessary assumptions regardin the nature of the field. It 
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is a highly specialized activity of so technical a nature that even 
the understanding of its results requires a high degree of proficiency 
in the subject concerned. 

There are certain features common to all research whether 
its field of operation is history or philosophy or any of the sciences 
or any of the arts. In the first place, as has already been indicated, 
research is not merely an accretional process of odds and ends being 
collected together. It has necessarily a sense of design and this is 
often foreseen and reflected in the title. Research is an organic © 
whole and is, for its own purposes, complete in itself. Secondly this 
organic conception of the research works as a criterion of relevance 
in ascertaining the range and the value of the material which is to 
be collected and analysed. A working hypothesis is accepted so 
that we may acquire a preliminary notion of what is going to be 
germane to our purpose. Our interest is normally in a particular 
phenomenon, say the unpunctuality of children attending schools. 
It would be totally irrelevant to collect facts about their height 
and weight or visual acuity as these are obviously not related 
significantly to the question under enquiry. Using a few hypotheses 
we shall trim the directions of our enquiry considerably until we 
are left with only such factors as are likely to have a significant 
relationship with the late coming of children. In history and 
philosophy we go to sources and erudite material on a similar 
hypothesis of relevance. A third feature common to all research 
is that connected with the necessary phase of collection of data 
or evidence. ‘The data of research may be found ready for colle¢- 
tion in the field, or may be gleaned laboriously from books of 
may have to be secured by experimental procedures. But data 
there must be; it is the grist without which the research machine 
will not go into action. Fourthly, there is a common stage of 
analysis of the data howsoever collected. This implies the intel- 
ligent ordering and ‘treatment’ of data so that instead of remaining 
an inert and dead mass it becomes dynamic and leads us to draw 
certain general conclusions regarding that class of material. This 
inferential process is quintessential in research and our research 
is as good as the generalization to which it finally leads. In 
historical and philosophical research this critical phase of work 
becomes a form of interpretation or exegesis. The significance 
and inter-relationships of the ordered data are brought out in a 
convincing manner by force of logical exposition. Finally a lucid 
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report of how and what has been achieved by research completes 
the picture of every research to whatever area of human knowledge 
it might belong. 


~ 


RESEARCH IN EDUCATION 


In education and psychology research naturally acquires a 
unique local colouring; it becomes somewhat typical of the field 
in which it operates. Education does not at first sight appear 
to be a body of specialized knowledge. It is an activity, an art, 
a process, a skill. But then every complex skill and technical 
process (and education is at its best nothing less than that) is 
based on a conceptual frame of theory. Knowledge of the factors 
involved is necessary if a task is to be successfully accomplished. 
Therefore education has two sides to it, firstly, the corpus of know- 
ledge on which the activity called education is based, and secondly, 
the skill or the technique aspect of the work. Thus education 
is as much an art as a science and as a science it includes a respect- 
able corpus of knowledge which has a direct bearing on the practice 
of education as an art. In such a corpus of knowledge we include 
facts about the mind of man, its growth and development; facts 
about his capacities and special gifts, facts again about his be- 
haviour at large or in special situations and even facts about 
the educational policies of different countries and their results. 
The size of this fund of knowledge is indicated by the courses in 
education prescribed by various organizations and institutions, 
by the existing literature on various aspects of this hydra-headed 
subject and by the volume of research which is continuously 
reported in dozens of highly specialized educational journals. 
A considerable amount of research therefore takes place as a 
matter of course in the field of education. This is equally true 
of psychology which plays such a significant role in every human 
activity that you need psychology to sell your goods as much as 
to heal your patient. 

There are various reasons why research is conducted in the field 
of education. The primary reason of course is that education 
as a process depends on a corpus of knowledge which continues 
to grow like the total amount of knowledge pertaining to any 
other science. In its scientific aspects education must study the 


data pertaining to it, viz., the child, the thing to be learned and 
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the process, including the teacher. As in other sciences it is not 
facts that are to be catalogued but rather the inter-relations of 
facts in all their bearings. It is necessary to go on adding to the 
knowledge we have of these relationships. Knowledge by its 
very nature is never stationary and complete. As new knowledge 
becomes available in other spheres it has inevitable influence on 
education and many consequential researches take place in that 
field also. But apart from this consequential need for additional 
knowledge there is direct growth of our knowledge of the various 
data of education and this is carried forward by means of 
research, the method par excellence of adding to existing knowledge 
in an orderly manner. 

Education is a means of attaining certain objectives and these 
objectives gradually undergo changes, faster in a progressive social 
milieu, slower in a conservative one, but without exception. New 
objectives mean new techniques of achieving revised objectives. 
The addition of new knowledge to a field, say physics, brings 
with it the problem of experimenting with new methods of instruc- 
tion; a changed social milieu or even conception of conduct might 
create new problems of non-conformity in social behaviour. 
Education being aclose ally of social agencies and being co-extensive 
with life is in this way in need of constant revision and change to 
maintain a high level of efficiency in its processes. If we conceive 
of education as a process then in the changing social world constant 
revision will be necessary so that the means, the methods, the 
agencies and the machinery of education continue to answer their 
purpose from day to day. In the third place constant research 
is } necessary in education to keep it from becoming a mere mecha- 
nical habit, with both the teacher and the learner. ‘The need is 
partly reflected in ‘refresher’ courses which are provided because 
teachers tend to become out-of-date, not merely because of an 
advance of techniques which leaves them behind but also on 
account of the human tendency towards rigidity and automatism. 
Whenever human beings are engaged in any activity which 
involves routine and _ repetition, they run the grave danger of 
becoming living machines and sinking deep into circular grooves. 
Worse than this they fossilize and become impervious to new 
trends, Revisions of techniques are so unwelcome that the human 
psyche builds up unconscious defences against change and finds 
arguments for continuance of old practices. Research is our 
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gad-fly where this tendency towards complacency is concerned. 
It makes for alertness and keeps the practitioners of the art of 
education awake and alive. To maintain the vitality and resilience 
of the process of education we need the encouraging principle 
of research. Education is a vital process involving intelligent, 
growing, sensitive and dynamic human elements and it becomes 
mechanical at the risk of unessencing itself. We therefore need 
research to simply keep out of ruts and stay mentally alive and 
kicking. 

Demand for research arises from yet another quarter. Community 
life is no more a matter of local and municipal concern. Most of 
our projects in the social field are comparatively large and have a 
wide application. Implementations of policy need careful blue 
printing and pilot programmes so that the people through their 
government do not get committed to faulty long-range projects. 
Pilot programmes need careful research to give correct estimates 
of cost and the resulting benefit to the community in terms of 
satisfaction and increased efficiency. If a project is put forward 
as desirable for implementation on rational and normative grounds, 
it will need to be reduced_proportionately in size,and tried out 
on a small scale. In this experimental set-up the conditions will 
be carefully controlled and the effects accurately measured. If 
this ‘field experiment’ succeeds the government may launch its full 
length programme with confidence in its success. This kind of 
small-scale trial is constantly being made in agriculture, medicine 
and commerce ; and high precision research pays ys large dividends 
here and is of the utmost importance. 

Research is in fact so widespread, popular and useful an activity 
that it plays a significant role in most social agencies and organiza- 
tions. In the United States the Gallup Poll is like a finger on the 
pulse of public opinion and attitude, and in our own country the 
state-run radio service conducts listener research just as the more 
modernist and advanced commercial agencies employ consumer 
research with quality control. This wide-spread research activity is 
due to the often overlooked fact that research is only a more eco- 
nomical, possibly the most economical, way of learning by experience. 
What the ordinary, old fashioned businessman finds out after costly 
blundering through trial and error, the research worker is able 
to demonstrate neatly on a short scale in an experimental set-up. 
It is only a rare genius who comprehends the subtle relations of 
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things by a flash of insight, and even that needs to be verified 
again in the actual situation. Research is the normal method 
of obtaining such understanding and knowledge of the truth which 
underlies the confusing array of facts of our common experience. 
Finally research has a disciplinary action on the personality of 
the researcher. It makes a better man of him. F irstly it teaches 
him to distinguish between personal impression and_ verified 
objective fact, between opinion and knowledge. He comes to 
realize that truth lies at the bottom of a well and is to be got at 
by trying carefully and conscientiously. It makes him sure- 
footed and he is not hasty in arriving at conclusions. He is not 
liable to accept facile rationalizations on trust. This intellectual 
alertness strains out error and inaccuracies from our thinking. 
If the conceptions on which our life is run are to be based on 
objective fact then most of our thinking should be guided by this 
critical spirit of research so that unwarrantable opinions are dis- 
carded early and do not mislead us into bad decisions. But 
research has yet another good influence on those who engage in it. 
They develop generally an enquiring turn of mind. They have 
not merely, as suggested above, a questioning mind but also a 
questing spirit and a hunger for scientific knowledge and a 
desire to add to it. This inquisitiveness about facts of nature is the 
prerogative and hallmark of man. And obviously it does not imply 
the urge to be that, universally detested, social menace called a’ 
busybody ; the research mentality is inquisitive not about the profe- 
ssion of anybody’s father-in-law or the number of his children but 
about the general principles of life such as the nature of fathers-in- 
_ law and the average number of children in an Indian family. Such 
Curiosity is most rewarding in scholarship and is mainly responsible 
for the growth of human knowledge. It is one of the finer effects 
of a sound general education and is of great educational value by 
itself, It is in effect an emancipation of our minds from mere 
tutelage and blind acceptance of authority and an education 
which does not do this is wanting in one of its essential qualities. 
A spirit of enquiry also adds to human efficiency and is of supreme 
value to key-men in every field of human activity. An educational 
administrator or any other officer holding a position which requires 
significant decision-making and planning cannot afford to let his 
organization run merely by routine. He needs to watch his machi- 
nery in all its ramifications with a critical eye. His observation 
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must be systematic and insightful and not formal and ritual- 
istic like inspecting a guard of honour. The research type of 
thinking will enable him to improve the working of his organization 
until optimum efficiency results. All policy-makers need a 
research trend in their thinking as do those in charge of merely 
implementing them. Even when certain decisions are merely 
matters of personal preference (as, for example, how many loaves 
and fishes should go to whom) the ways and means of giving effect 
to them are largely matters of research in planning and imple- 
mentation. 

In an ordinary teacher the research mind is nearly a necessity 
because he must try daily to improve his techniques and methods 
of instruction and education, and must periodically evaluate 
results in a scientific spiritand manner. He should be in a position 
to think scientifically and fruitfully about his pupils, his subject 
and his work. In short, it may be said that the spirit of research 
is the characteristic of most useful human activities and is a thing 
to be cultivated for its own sake. Its finest flowering is seen of 
course in its rigorous application to specialized fields of knowledge. 


AREAS OF EDUCATIONAL RESEARCH 


Research in education .may be classified according to the dis- 
tinct field in which it takes place. One field of research in educa- 
tion is that concerned with a time-perspective of things which gives 
us a historical view of education and allied matters. Historical 
research requires a special method which is peculiar to itself and 
has attracted some very able minds. This type of research is 
like history, suspected by the more practical minded of being 
somewhat wasteful. They argue that being obsessed with the past 
it is useless if our activities are to be oriented towards the future. 
The justification of historical research is the same as that for history 
and allied studies. The recent past can always be regarded as 
worth our study since it helps us to understand the present state 
of things. Regarding the ancient past some doubts as to its direct 
utility for us may be raised. It has been maintained by the pro- 
tagonists of history that as a subject in the curriculum it has a 
cultural and human value. The difference between men and 
beasts is, among others, that men have a sense of history. It is 
not very certain if a knowledge of history prevents men from com- 
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mitting the same mistakes again, for, it is said that history repeats 
itself. But a historical view gives us a clear perception of the 
processes of development and the trends in educational thought 
and practice. It helps us to observe a changing concept of educa- 
tion through generations of men and find our own place on the 
curve of time. ‘The study of changing relationship between educa- 
tion and its social matrix is also rewarding and enriches our under- 
standing. Apart from other things it is satisfying, for a sense 
of history (not merely personal memory) is inevitable in man. 
Though no immediate purpose is served by the knowledge that 
a particular text book was used to teach Princess Jahanara or 
that certain subterranean cells housed the pupils of the Nalanda 
University, there is as inevitable a sense of happiness in the know- 
ledge as in our handling a cherished chronometer worn by our first 
ancestor. Historical research has a very wide application in educa- 
tion and is of interest to scholars who have a good grounding 
in the historical method of research and requisite languages. 

Another branch of research is concerned with a rational under- 
standing and interpretation of facts or thoughts of other people. 
This is an exegetic function of research and is mainly philosophical 
in approach. Brilliant interpretation which explains and offers 
a judicious critique of facts or thinking of other people amounts 
to research. The activity is largely literary but the researcher’s 
gift of rational analysis and logical rigour will distinguish it from 
a literary work. The literary critic can, and very often does, offer 
an opinion but a researcher must always quote ‘chapter and verse’ 
in support of his inferences. The matter for research here consists 
of the observed facts of education or the written works of other 
people. An insight into the zeitgeist of a period or a full compre- 
hension of what some great mind thought about education or 
other rational inferences drawn from the practice of education — 
is a valuable activity and adds to our total knowledge relating 
to the subject of education. Frequently the philosophical type 
of study is directed to the past and then we have a joint operation 
of this interpretative type of study with the rigour of the historical 
method. This area of research requires mainly rational, logical, 
analytical and insightful thinking directed upon well defined 
group of facts or ideas. 

In administration too we need research and this concerns itself 
largely with problems peculiar to the administrator. He needs 
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a bird’s eye view of the field which he administers so that signi- 
ficant policy-decisions may be taken. Comprehensive surveys 
of the field give him an understanding and insight into his prob- 
lems. He also frequently needs to know the way in which the 
machinery of education is functioning and also to evaluate the 
outcomes of the educational activity he supervises. A systematic 
and intelligent comparison of the administration and organization 
of education in various countries, a cautious consideration of the 
financial implications of educational planning and an assessment 
of the ways and means of meeting the cost at each level are all 
very germane to the task of the educational administrator. 
Whereas every official decision is bound to be based on some 
consideration of facts and figures, research highlights the salient 
features of each sector of administration. The administrator 
is often not an expert in the task of analysing critically the mass 
of facts which the secretariat is ready to fling at him at the slightest 
provocation. The raw material of research needs some expert 
predigesting before it yields its full meaning. It is here that research 
sections make their most telling contribution to administrative 
efficiency. The administrator has only to state his problem in 
its general as well as specific terms and then ask for the relevant 
facts to be put up. The research personnel will then break up 
the mass of facts along new lines so that the critical facts stand 
out in a quantitative form. Not only is it a mistake to underrate 
the significance of research in administration but it may be taken 
as axiomatic that administration will be efficient, objectively 
oriented and progressive to the extent it has recourse for 
its decisions to scientific analysis of facts which is the proper func- 
tion of research. 

Similarly in the organizational field too we need a lot of research 
directed this time upon another class of material. Organization 
of education would normally imply the ‘within the institution’ 
conditions of schooling. In one sense these conditions are of 
greater interest to the educationist since they affect the children 
in schools directly whereas the problems of administration touch 
them only at a second remove. Questions of furniture, equipment, 
building, time-table, books, libraries, extracurricular activities, 
discipline, examination and promotion are all of vital interest to 
the ordinary teacher as well as the educational administrator, 
since it is they that affect the child positively and directly. There 
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is great scope for research in all these matters and this is another 
fruitful area for the researcher in education. In the U.S.A. curri- 
culum reorganization has become something of a shibboleth and 
unless we subsume it under organizational matters, it may be regar- 
ded as an equally important area of research. 

The methodology of teaching various school subjects could 
equally well include problems of curriculum, at least within 
each subject. It is by itself a field much vaster than the 
problem of curriculum and_ provides ground for considerable 
research of the experimental kind. There are a number of 
factors in the instructional situation which need to be manipulated 
for maximum learning to take place with a certain amount of 
general effect of an educative nature. There is firstly the subject 
with its corpus of knowledge and special skills; there is next, 
the class composed of children with their individual differences; and 
finally, there is the teacher with his particular personality and his 
chosen method. Some very complex experimental designs can 
be devised to find out by crucial tests the interaction of these 
various factors in the situation. As the primary and direct task 
of a school is to teach certain subjects, research in methodology 
implying an examination of the factors involved in all their com- 
plex relationships is of vital importance to those who would assess 
the efficiency of a programme of work. 

Last of all, there is the significant and interesting area of psycho- 
logical. research which is nearest to the activities of this nature in 
the field of pure science. Search for psychological principles which 
would explain the behaviour of children and youth and enable 
teachers to handle them more efficiently is a sine qua non of progres- 
sive education. Psychological facts concerning individuals and 
groups can be studied clinically, experimentally or statistically. 
The school situation is specially suited to the ‘field’ type of study 
where the experimental technique can also be applied sometimes. 
In exceptional cases the clinical approach can also be utilized 
for ameliorative purposes. This is the favourite ramping ground 
of most young researchers in education and is frequently overrun 
by callow enthusiasts who lack the required rigour of regimen. 
Research in many other areas already mentioned can frequently 
be psychological in nature. For example discipline has a psycho- 
logical aspect as well as an administrative and organizational one. 
Similarly a good deal of methodology of teaching is concerned 
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with psychological factors. Therefore psychological research 
makes its impact on other areas also and becomes more or less a 
special way of looking at school problems. For example, a delin- 
quent act in the school not only receives the attention of the head 
but is also within the purview of the guidance officer or the school 
psychologist. 

This leads to the general conclusion that in the areas of research 
above demarcated, overlapping is possible. For example, even 
historical research tends to appear as the first chapter in almost 
any research, where the history of previous research on the parti- 
cular topic is summarized. The fields of research suggested above 
are therefore formal abstractions and need not be regarded as 
strictly and mutually exclusive. 


RESEARCH IN PSYCHOLOGY 


In the traditional scheme of studies, ‘Mental and Moral 
Philosophy’ always included psychology as a familiar member 
of a triad which comprises metaphysics and ethics. Philosophy 
is by its very nature a wide and pervasive discipline, and tends 
to permeate and colour every aspect of life, for which reason the 
witty and perceptive G. K. Chesterton remarked that it was even 
more necessary to know the philosophy of your landlady than to 
have a knowledge of her cooking. Philosophy seeks to give us 
a world-view, a complete understanding of man, the microcosm, 
of the macrocosm of the universe wherein he finds himself ‘like 
a miracle in a miracle set’, and a concept of his function therein. 
The earlier approach to the study of mind was largely contempla- 
tive and introspective. In the twentieth century, however, psycho- 
logy was invaded by persons devoted to the methods of natural 
science and currently psychology has more or less broken away 
from the older traditions of philosophy. It now aspires to be a 
science and has developed a vigour of regimen in experimental 
procedures and mathematical formulations of theory which bids 
fair to secure it a status allied to it, not identical with that of natural 
science. At the same time, in recent years psychology has proli- 
ferated into congeries of parallel theoretic constructs also which 
are in no way different from the contemplative vision of psychology 
as a branch of philosophy. For example, Freud and the Gestaltists 
have produced a scaffolding of theory around observed facts of 
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mental life, which provide one of many possible explanations 
of phenomena. This kind of ratiocination is strongly reminiscent 
of the explanatory principles of ‘systems’ of philosophy of 
which modern man is so chary. Sankar’s mental categories 
in Viveka Chudamani or Patanjali’s in his Yogasutra are in essence 
undistinguishable qualitatively from ‘the sheol of the Unconscious’ 
propounded by Freud. Nor is old fashioned psychology incapable 
of experimental verification. The yoga system claims that results 
can be produced if the conditions are met. It would seem that 
the traditional psychology depended less on elegant apparatus 
and more on the. naked exercise of human mentality under 
minimum external control. 

Be this as it may, it is true that many rival ‘schools’ exist 
in psychology even now and a rational consideration and compari- 
son of these provides one kind of research in psychology. This 
kind of study becomes largely erudite and exegetic, analogous 
to what in this text has been designated historical and philosophical 
type of research. Rival theories can» be contemporaneously 
discussed or they may be considered in the perspective of time, 
when a historical slant will appear in the consideration. For 
example it would be interesting to consider comparatively the 
ancient concept of personality in Hindu psychology with its coun- 
terpart in modern times. This kind of research belongs more 
to psychology as a branch of philosophy than to the experimental 
discipline we have come to associate with modern psychology 
which claims to be ‘a quantitative, rational science’. Even so 
undeniably this is one form research activity takes in psychology. 
Nor is it an entirely idle and fruitless pursuit for when rival 
explanatory concepts exist side by side in a field of knowledge 
it is desirable that, failing a decision in the experimental field, a 
balancing of claims on our credulity should be undertaken on the 
basis of rational analysis of the contending ‘schools’. 

A second kind of research is more pragmatic and aims at produc- 
ing new and improving old methods and means of the study of 
psychological entities. Psychology as a science is concerned with 
the description of mental phenomena. This description has to 
be accurate and precise. The precision comes from the use of 
refined instruments of appraisal. Research in this area comprises, 
on the one hand, the invention of instruments and apparatus of 
control of conditions (such as the use of the one-way glass for 
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observation) and of stimulation (e.g. the asthesiometer or the time- 
wheel) and recording (like the Hipp chronoscope for reaction tim- 
ing) and, on the other, the evolution of testing devices like the 
tautophone, the Rorschach or the aptitude test battery. This 
is a promising and useful line of development and the ingenuity 
of the young researcher should be more systematically directed 
to these objectives than has so far been the case. The most fami- 
liar failing of the present day young psychologist is that he is 
content to use the existing machinery for the collection of data 
and spends all his ingenuity in manipulating the derived obser- 
vations. Persons with a background of physics and a mecha- 
nical bent of mind could very well contribute to the development 
of new types of equipment for serving new purposes and solving 
intractable problems of measurement. Original minds can 
equally well devote part of their research effort to devising new 
methods for the appraisal of different aspects of mentality. 

A type of research activity associated with this kind of inventive- 
ness is that which provides new rationale for the solution of equa- 
tions for additional unknowns. The entire test theory is a splendid 
growth of this kind of knowledge. This activity is analogous to 
what Einstein did when he solved the equation for energy; it is 
in brief a jugglery with symbols but it is highly rewarding, for mira- 
culously enough abstract mathematical formulations are reflected 
sufficiently in the external world to hold good for all practical 
purposes. The formulae for space or time errors in psychophysics 
and for reliability or standard error of a test score are examples 
of this kind of research. The entire development of factor psycho- 
logy is proof of the value of this approach to problems of psycholo- 
gical research. 

Yet another kind of research is still more pragmatic in orienta- 
tion and seeks to solve practical problems. It is by reason of this 
workaday attitude that psychology has come to win the recogni- 
tion it enjoys in the wider public. This is obviously the applied 
aspect of the discipline and makes its contribution to economy 
and efficiency in various fields of human activity. The far-flung 
programmes of educational measurement, the universally used 
methods of personnel selection, the predictive functions of aptitude 
and general classification tests and the demand for ‘human engi- 
neering’ are indicative of the variety of uses to which the know- 
ledge of human psychology can be put in the service of man. 
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Psychologists in all these fields of human activity need to experi- 
ment and undertake research to produce results and justify not 
only themselves but the discipline which they profess. In a sense 
psychology possibly is the ultimate science of humanity, for it deals 
with the mind which is according to the best philosophy the ulti- 
mate reality and the substrata of all that passes for existence. 
Psychological research as a problem-solving activity is of consi- 
derable importance and has rightly been extensively cultivated 
as any textbook of applied psychology will show. 

Finally a certain amount of somewhat esoteric research goes 
into the investigations of facts of psychology per se. This is a pro- 
cess of verification of the theoretic constructs of various ‘schools’. 
The Gestalt experiments are of this class. The intention in such 
work is to uncover and map the ground which constitutes men- 
tality. Facts of mental life are not as indisputably established 
and identified as an elementary text might suggest. New prin- 
ciples, determinants and connections are suspected, sought out 
and demonstrated. The field of ability and aptitude has, for 
example, béen subjected to searching examination by factor-analy- 
sis. Learning theory and the principles of cybernetics and the 
information theory as applied to the mind are new developments - 
in the field of psychology and are likely to affect its principles 
considerably. This kind of research is of no direct interest to the 
general public. As reported matter it actually presupposes in 
the reader a degree of expertise and in itself constitutes the arcana 
of an advanced branch of knowledge. Seats of higher learning 
are natural centres for such research activity, the findings of which 
are reported in selected gatherings of learned bodies. The disci- 
pline of psychology is represented in its ‘pure’ form in such activity. 

The various specialized branches of psychology are indicative 
of the range of settings in which research can grow. Some idea 
of this range can be formed from the areas of specialization 
mentioned in directories and national registers which furnish 
classifications of trained personnel and experts. A random 
listing of these include such categories as clinical, counselling, 
genetic, educational, experimental, comparative, physiological, 
industrial, personnel-vocational, social, abnormal, psychoanalytic, 
psychiatric, quantitative including such branches as experimental 
designs, factorist, those concerned with mathematical-statistical 
models, testing, scaling, personality appraisal and human 
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engineering. The range of activity offers a wide scope for research 
workers with almost any kind of predilection and aptitude, and 
is a sign of youthful vigour in this young discipline. Any young 
researcher has reason to rejoice in such extensive freedom of the 


field. 
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CHAPTER 4 


THE HISTORIGAL AND PHILOSO- 
PHICAL TYPES OF RESEARCH 2 
EDUCATION 


A, THE HISTORICAL RESEARCH 


It is only in the abstract as logical entities that we can distin- 
guish between matter and method; in reality they form an organic 
whole and matter determines method analogously as objectives 
determine means, and content and spirit determine style and form 
in literature. In this sense it is possible to speak of historical 
material being treated by a method that is characteristic of, if 
not peculiar, to it. It is undoubtedly true that the method of 
historical research possesses certain special features which reflect 
the peculiar nature of the subject matter of history. | 
The term ‘history’ has had its own rather interesting history. 
Etymologically ‘historia’ originally meant learning or knowledge 
achieved through enquiry. In other words it signified knowledge 
gained by a process analogous to research. It is in this original 
sense that Gilbert White used the word in naming his well-known 
book The Natural History of Selbourne. Subsequently the meaning 
of the word became more restricted to a record of the past. In 
current usage the term is restricted to human history, for, the earth 
and the solar system have a past which we do not include in history. 
In academic terminology history refers largely to political history 
although the recent trend is to trace the political history of a coun- 
try against the background of economic, social and cultural deve- 
lopments. Probably one reason why history has remained a 
record of kings and wars is that it is so much easier to trace the 
record of these and so much more difficult to delineate the fate 
of the obscure commonalty. Modern trends in history are to 
emphasize human life in its wider aspects and to relate political 
events to their contemporary milieu. Proper history begins where 
legend and myth end. History is in the words’ of Thucydides 
an ‘exact knowledge of the past’. And to this end, according to 
Ranke, the historian tries to find out ‘precisely what happened’. 
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History is thus by its very nature a form of specialized research. 
The truth of history is never self-evident; it does not simply exist 
and is not in that sense ‘given’ (Walsh 1951). It needs every time 
to be searched and ‘established’ before it can be accepted. History 
does not begin like a fairy-tale with ‘Once there was a King se 
It starts with a hunt for names and dates and sifts through a lot 
of data before settling these issues beyond doubt. In recent years 
philosophers have been giving some thought to history and as a 
_ consequence a historical form of idealism has emerged. One 
interesting result of such philosophical insight has been the dis- 
tinction between history and chronicle for which we are indebted 
to the Italian idealist philosopher Benedetto Croce (1916). 
Chronicle, according to him, is a mere sequential list of irrelevant 
events in their temporal order of occurrence. The only necessary 
relationship between the items or events in a chronicle is that of 
time, of precedence and succession in the calendar sense. History 
is on the other hand a consequential chain of events vitally 
related to each other by ‘significant explanations’ (Croce 1916). 
Flistory then is a reliable and meaningful record of the past of 
the human race considered in its wider and more general aspect. 
That humanity is peculiarly privileged to possess and maintain 
such a record and has need to do so is naturally to be considered 
next. 

The differentia of man are many and among these may be 
counted the fact that man is peculiar in possessing a sense of history, 
apart from a personal memory which he may share with the lower 
orders of living beings. Human beings do not merely have. the 
gift of memory but also like to remember. Witness the craze for 
souvenirs, mementos, heirlooms. The heritage of man is due 
to his high regard for the achievements of past generations and to 
this sense of history. This sense of lineage, of great forbears and 
ancestors is unique in man and is possibly the cause and justifica- 
tion of man’s historical activities on earth. It is also contended 
that history is utilitarian. For example it is said in defence of 
historical study that it enables us to understand the present. This 
understanding is possibly of the kind implied in the French bon 
mot, ‘tout comprendre est tout pardonner’ (to understand all is to forgive 
all). If human society is in a particular state today it has arrived 
there through successive stages which ‘explain’ its present condi- 
tion The present problems of caste and languages in the Indian 
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subcontinent are in this sense ‘understood’ by a reference to the 
past. And just as history by a throw-back enables us to under- 
stand the present, it can equally help us to project our minds into 
the future and form some estimate of expectations therefrom. 
Thucydides thought that history enabled one to ‘interpret the 
future’. It is reasonable to believe that in the dynamics of corpo- 
rate life our present situation is explainable in terms of the momen- 
tum of relevant past events as much as our future may be regarded 
as, in part, determined by the present which bears within its matrix 
the unspent forces of past trends (Popper 1945). Although a 
partial conditioning of the future by the past is admissible it is 
doubtful if history has lessons to teach us. Actually the saying 
that ‘history repeats itself? seems to imply that errors tend to 
recur under similar circumstances. It is indeed doubtful if condi- 
tions can ever be identical on the time line. Not only the mulieu, 
the zeitgeist and conditions change but the very participants are 
differently motivated and swayed by different emotions and ideals. 
Therefore we cannot hope to generalize from individual events 
and draw ‘lessons’ from the past to guide our actions in the present. 
What we expect the past to do for us is to inspire us with courage, 
fortitude, and hope as King Bruce was inspired in his forlorn hour 
of despondency. Beyond such moral ‘lessons’ history has no other 
lessons to impart to the present generation. History serves its 
greatest purpose as a record of the march of humanity on the road 
of progress in which sense it is a very useful compass for taking 
bearings of the directions of our progress and measuring the speed 
of our advance. In a review of the past we trace this route and 
the degree of acceleration in each period. By these considerations 
human society is oriented to its self-imposed task of progress 
and civilization which ultimately lead to an ultima Thule of 
human destiny. 

An advantage of history that is often overlooked is its effect 
on the personality of the historian and the student. There is 
bound to be an inevitable prejudice in favour of what is one’s 
own. Historical research and study require one to overcome 
this megalomaniac distortion and to see things as they are in their 
objective reality. The prejudice to which the historical worker 
is subject is much more severe than that from which the 
scientist may suffer. It is therefore an education for him 
to overcome it in the course of his work. But apart from this 
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general benefit the historian gains by his pursuit an incomparable 
advantage of a cultural nature. History ‘humanizes’ and imbues 
the student with a sense of human value, as no other study does, 
not even literature. For literature is concerned with aesthetic 
creativity and human values play a minor and only occasional 
role in it. History, on the other hand, is saturated with the ethic 
of man, and moral issues must arise and be settled in the course 
of historiography. History preserves the heritage of man and 
it is this heritage that challenges us to be worthy of it and rise 
above our brute nature. And finally the romance of history will 
put into the shade the perfervid imaginings of any poetaster. His- 
tory conjures up dreams which ravish our imaginations and trans- 
port us into a world of vanished reality which we love, revere 
and believe in. The human spirit returns to the present from a 
sally into the memorable past greatly refreshed and strengthened. 
And this is yet another reason why mankind will not willingly 
give up history. 


Differentia of Historical Research 


Historical research presents difficulties of a kind not met with in 
any other branch of learning. -Many philosophical considerations 
arise which cannot be simply settled out of hand. The first thing 
to note about historical material is that it possesses an ‘ineradicable 
temporal locus’ (Cohen & Nagel 1934) and pertains to the ‘closed’. 
class of data (Walsh 1951). In science we deal with a sampled 
element, representing a class of elements which is ‘open’ to us for 
future sampling. To study the morphology of an organism 
amounts in essence to finding out general facts about that class 
of organism. In history we are not concerned with such ‘open’ 
classes of phenomena. When a historian studies Akbar he is 
not studying him as a class of such persons but as a concrete specific 
case having the aforementioned ‘ineradicable temporal locus’. 
Akbar is a unique event in time and space whereas the case studied 
by most scientists is an example of a class which overrides condi- 
tioning factors of time and space. This characteristic of history 
viz. that it is concerned not with attributes common to entire and, 
often, infinite groups but with concrete individual events definable 
in time and space, makes it a unique type of study and requires 
peculiar methods to approach its data. 
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The data of history are also of a unique kind. They are mainly 
the ‘traces’ of past events in the form of various types of documents 
and evidence having a direct or indirect bearing on the event under 
examination. For example, there is never any possibility of our 
studying Akbar in person. Even if a contemporary of Akbar 
had tried he would not have been able to do so. Contemporary 
records lack the perspective of time and their writer is ‘too near 
the event’ to judge of its historical significance. ‘The historian 
who makes a study of Akbar must beleng to the distant future. 
In other words Akbar must belong to history and the definite 
past before history can be written about him. Thus the historian, 
by the very nature of his task, is doomed to sift the cold ashes of 
deeds long accomplished. His work. is therefore not unlike that 
of Sherlock Holmes examining the scene of a crime. Thus all 
evidence in history is secondary and indirect (Hockett 1955). 
It is from the residue of the past that the living past has to be 
reconstructed. Obviously this is an inferential task involving 
a certain number of assumptions and hypotheses. Not unnatu- 
rally the charge of subjectivity is levelled against the historian. 
It is said that in his treatment he is liable to over-stress some aspect 
of life—cultural, economic, military or political—according to 
his predilection; he is also free to select and highlight certain events 
and slur over others as he sees fit. Historians have not been slow 
in revgaling radical differences among themselves. These arise 
from personal and group bias and also from conflicting theories 
of historical interpretation such as the hero-worship school of. 
Carlyle or the economic interpretation of history due to the Com- 
munist philosophy. The sciences do not show such divergences 
in the technique of handling and interpreting data. History 
therefore stands in need of various corrections to this tendency 
to subjectivity. 

Again the truth of history is a concrete truth of the closed class 
whereas the truth of science is a general proposition which is valid 
for an open class of data. In history we use all the ‘traces’ left 
by a single concrete past event to learn all about that event in its 
temporal and spatial setting. Although certain assumptive gene- 
ralizations about human nature and conduct may be used now and 
then the task of historical research is radically different from that 
of scientific research. Historical truth is established by the ‘corres- 
pondence theory’, which means that the statement corresponds 


HISTORICAL AND PHILOSOPHICAL RESEARCH IN EDUCATION 23 


with facts. This is a poor theory to work with as it leaves us beg- 
ging the question regarding the ‘facts’ which need to be established 
before they can be stated. The theory provides mainly for testing 
the veracity of source material and for exactitude in expression. 
The second theory employed more usefully to ascertain historical 
truth is known as the ‘coherence theory’ and requires that if a body 
of confirmed historical knowledge exists then a new element must 
fit in with the rest to be acceptable as true, unless of course its 
authenticity is of a higher order and is unimpeachable, when the 
knowledge in hand must stand corrected in its light. This is a 
process of mutual justification by fragments of total historical 
fact and elimination of specious elements by use of conflicts as 
arguments for rejection. 

- There is another quality about historical thinking that merits 
a separate mention. Science, it has been said earlier, describes 
phenomena by measurement and condensed symbolic statement 
of relationships. This implies observation and an observer. The 
chief and distinguishing quality of these observations is that they 
are objective. It may therefore be said that the scientist as a 
seeker of knowledge chooses to stand apart from the object of his 
observation. He thus observes the external appearances of things. 
In psychology the Behaviourist school takes up such a strictly 
scientific and objective point of view and aims to study externally 
observable behaviour of the subjects. Contrary to this the his- 
torian is never standing quite apart from his subject. The truth 
of history is not apprehended by observing the externals. The 
externals in this case are the source of history which are mere 
remnants and residues of the past. It is from these debris and 
residues that a complete, rich and vital human past has to be 
constructed. It may be said that this is like constructing the life- 
like image of a pre-historic monster on the basis of a few bone 
fossils. But there is a significant difference. The dinosaur is 
for the zoologist an objective fact of nature, an external reality 
with which he has no point of contact other than that of objective 
observation. Our contact with the participants of the drama 
of history is a deep, subtle and intuitive one (Dilthey 1910, Colling- 
wood 1946). We understand them, their actions, their trials 
and triumphs in a manner quite different from our knowledge 
of the dinosaur. Historical truth is, therefore, intuitively appre- 
hended and requires us to enter deep into the hearts and lives of 
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men and women of the past to achieve a proper understanding 
of events as they occurred and how and why they occurred. It 
is only when we write what Croce has called ‘chronicle’ that 
we take up a purely objective attitude and note only the surface 
appearances of things. 

It has been said that science predicts the course of future events 
by its power of generalization whereas history uses present evidence 
to ‘retrodict’ past events. In both a calculus of probabilities is 
constructed on the basis of assumptions of likely relationships and 
correlates. The logical processes of analysis and education of 
inference must characterize both except that in the case of science 
abstruse symbolism and technical argot render the meaning some- 
what esoteric whereas in history the rational process of ana 
and inference is verbalized into everyday speech. 

Having considered some of the special characteristics of the 
historical method and its differences with the scientific approach 
we now proceed to consider this method in some detail. Let us 
first consider the personal equipment of the historical researcher. 
A historian generally has a wide cultural background and a natural 
interest in the wider fate of mankind. His outlook can never 
be narrow and parochial, nor is he culturally hidebound. This 
liberty comes of extensive knowledge of the history of mankind 
in diverse climes, conditions and times. He must possess in 
addition specific, detailed and deep knowledge of the field wherein 
he intends to conduct research. His ability to make use of research 
material, such as inscriptions, coins, scrolls, old documents, monu- 
ments and archaeological finds, depends on expert skill in reading 
these signs. As has already been said he needs personal detach- 
ment and freedom from bias and prejudice more than any other 
class of research worker. Apart from this he needs for success 
certain temperamental qualities, such as capacity for taking 
infinite pains with the subject, imagination and sympathy, under- 
standing of human nature and an intuitive insight into forces 
which have swayed men and affairs from times immemorial. 


The Method : The Theme 


Research in history begins with the search for a suitable theme. 
The theme may be entirely new—a historical survey of a period 
or a people, which has never before been attempted. This will 
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be a general history under a new head. Alternatively the 
researcher may split off from the main current of general history 
a special segment for evolutionary treatment e.g. the history of 
costume or music or culture or painting. Such sectional history 
is undertaken by specialists in the related field of knowledge and 
history of music or painting or literature is more within the 
purview of teachers of music, painting or literature, than the general 
historian. But the research is nonetheless historical if the time 
perspective dominates the unfolding of the theme through the 
ages. History of experimental psychology or history of education 
are in this sense specialized branches of a more generalized historical 
research activity and need an equally rigorous application of the 


Aistorical method. Apart from these two types of themes 


there is a lot of corrective and revisionary research possible in 
history, and frequently research is directed to reviewing old history 
in the light of newly discovered source material. 

A research theme does not appear out of the blue. It might 
almost be called the flower of deep study and extensive information. 
Another requisite is a ‘doubting Thomas’ cast of mind (Whitney 
1947). If a would-be researcher is inclined to read most old 
expositions reverentially on his knees he is not likely to light on 


' discrepancy, inconsistency and error which open up fresh enquiries. 


Therefore it helps to question even established authority when 
it happens to fall under suspicion. Ifa little checking does not put 
the authority in the clear then a further investigation is in order. 
Like any other researcher the historical worker too must of course 
possess a genuinely curious mind interested, by a natural inclination, 
in turning up new knowledge. F amiliarity with previous research 
in the field is useful not only in suggesting new topics but in 
identifying old ones which have already been tackled one way 
or another and have not enough unexplored ground for further 
research. In fact one of the prerequisites for the commencement 
of research work is a tally of all ancillary and similar research 
through indices and bibliographies to make sure that duplication 
is not taking place inadvertently. 

The discovery of a suitable subject and its exact wording is 
more important and more difficult than is generally believed by 
the young enthusiast. This is in fact a major task in the sense 
that it is by its very nature beyond the ordinary powers with which 
a young aspirant to historical research may be credited. In the 
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first place he may, if not properly guided and helped, bite more 
than he can chew in his state of comparative ignorance of the 
scope, possibilities and implications of the topic. It is entirely 
reasonable to attempt “The Origin and Development of Teacher 
Training in India’. It may dawn only later on the student that 
there is a deterrent paucity of data on this subject. Similarly 
“The Evolution of Educational Theory in Mediaeval India’ sounds 
elegent enough as a topic but once again one may be barking 
up the wrong tree. The person who knows the field in detail 
and thorougly alone can say where research can be most advantage- 
ously directed. And yet it is the tyro and the novice who is 
required to state his topic formally. Experience has shown that 
a research topic is as good as the scholarship and insight of the © 
worker. In history the supervisors and guides have the necessary 
duty to indicate the scope and nature of the source material available 
for a particular theme. Where extreme paucity or unreliability 
of data render research fruitless the young researcher should be 
warned off in advance. 

Specificity is a virtue in a research topic. Vagueness, ambiguity 
and generality lead to varying and uncertain interpretations and 
add to the confusion which all research is calculated to dispel. 
Sometimes elaborative sub-titles are employed to emphasize this 
specificity in the topic. It is axiomatic that the more general a 
topic is in wording the more inclusive it becomes. ‘Education in 
India during the Mughal Rule’ is comparatively general and wide. 
“The Instruction of Mughal Princes and Princesses’ is clearly much 
more limited. By preference definitive terms should be employed 
which convey a clear meaning acceptable on all hands. The 
question of elegance is not irrelevant. The title of a paper and 
research should read well; the more so in history which favours 
the cultivation of an elegant and eloquent style. There is a 
terminology peculiar to history and it is advisable to use the term — 
which has an accepted meaning and currency in historical litera- 
ture than a new or common expression of every day speech. Pic- 
turesque and figurative titles such as “The Grand Rebel’ for a life 
of Shivaji are not acceptable for sober research. They may catch 
the eye and arouse curiosity but they are imprecise and really 
meaningless. Research titles even in a romantic field like history 
continue to be matter-of-fact if illumined by scholarly elegance. 
‘The Decline and Fall of the Roman Empire’ is a good example of 


HISTORICAL AND PHILOSOPHICAL RESEARCH IN EDUCATION 2 


such a title. The initiate in historical research is advised to look 
upon the title as a solemn’ commitment which should be examined 
thoroughly in advance and consulted about freely before final 
acceptance. 


Collection of Data 


Those who undertake research in education are generally led 
to believe that the historical type of research is simpler and easier 
than the more rigorous scientific type. This is not always true. 
Both types of research require considerable advance preparation 
in the aspirant; in history, a great deal of fairly detailed knowledge 
of the field in general, and in scientific research an intimate know- 
ledge of the techniques involved. The extensive knowledge of 
the history research aspirant helps him in the initial phase of data 
collection by furnishing him with a good range for the application 
of a hypothesis of relevance. What must be included in the data 
on prima facie evidence must show itself to have a direct or indirect 
bearing on the theme. This is the criterion to employ for the 
preliminary collection of data. And here another difference 
with scientific type of research may be noticed. The historian 
is not in a position to sample data as the social scientist or even a 
physical scientist would (by taking a number of readings or obser- 
vations to eliminate errors of measurement and observation). By 
the nature of his case his data has to be inclusive; every thing that 
concerns the topic and is available must be breuehe in. This 
naturally means a lot of confusion if one is not on guard against it. 
Therefore the first task is to foresee, as far as possible in one’s 
state of partial ignorance, the nature of the data. On this basis 
it is necessary to propose a number of general heads into which 
the entire mass of data may be classified. 

Classifications may be logical, conventional, temporal or any 
other kind. So long as they roughly follow the categories into 
which ultimately the facts will be displayed, they will do. Such 
classifications should be approved by people in a position to advise 
as the tyro can make mistakes here which will require readjust- 
ment later. But in any case the mistakes are not serious and 
do no permanent damage. Once such heads have been secured 
the researcher can forge ahead taking down notes copiously in 
separate books or separate sections of the same book. For example 
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in tracing the history of education in a certain period we could 
collect material under such conventional headings as education 
of the people, the nobility, the women, the crafts and professions. 
Alternatively we could employ other types of ordering such as 
educational administration, finance, organization, discipline and 
instruction, curriculum, examination. Or we could in the case 
of a dynasty deal with education as a whole, reign by reign, which 
gives a temporal classification, with subsequent sub-heading for 
sectional treatment. The securing of the classified data from 
source literature and material constitutes the labours of Hercules 
for the historical researcher because, normally, if the research 
is really original and worthwhile, the data lie scattered over a 
wide field in all kinds of unlikely places and need a lot of patient 
delving and deciphering to come to hand. 


Criticism and Classification of Data 


Within this classified data, which is like ore from a mine, a lot 
of cross-referencing and ccllation is necessary to cancel out error 
and spurious elements. This is logically so simple a process as 
to be close to mere common sense, but requires a good memory 
for detail. If one has been generous in garnering data some 
elements may have been caught in the net which at the end do 
not appear germane and may at this stage be rejected. The 
rest must be now subjected to what is known as ‘external and 
internal criticism’. In ‘external criticism’ we consider the source 
material itself and decide its genuineness. It is like asking if this 
is the mausoleum of Sher Shah or is it a case of mistaken identity. 
Spurious material, forgeries and other ‘misleads’ are here un- 
covered and this requires a lot of erudition in the researcher and 
a wax-tablet memory. Having settled the authorship and date 
of the document and its genuineness we next proceed to the 
‘internal criticism’, i.e. an evaluation of the content or the infor- 
mation contained in the source. Internal evidence is of course 
used to settle questions of genuineness, data and authorship also 
but now we examine what the source has to tell us about the 
range of subjects covered by the theme. Questions of corruptions, 
plagiarisms, variant readings and renderings are a necessary part 
of textual criticism which come up now. This again is a highly 
specialized job and will re-emphasize the truth that historical 
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research is in no sense easier than the scientific kind. F inally in 
internal criticism we ponder the meaning of the source document. 
There is a positive error here of reading an outside meaning into 
the document and a negative one of not grasping the true meaning 
of the statements as they stand, for there may be a literal meaning 
and perchance a real meaning behind it. The purport of a docu- 
ment may not beobvious and an effort must be made to understand 
the real meaning; in other words the ‘correspondence’ of the 
literal statement with its historical fact must be established. It is 
good to consider motives and subject documents to the suspicion 
of being inspired by slander, rumour, partiality, emotional distor- 
tion, and imaginative hyperbole and embellishment. 

All this is comparable to the filigree work of a goldsmith and 
require meticulous care and profound scholarship with a deep 
knowledge of human nature, and contemporary social and cultural 
climate. 


Interpretation 


The next phase is concerned with the ordering of data under 
each head of classification. Facts supported by references to 
evidence are arrayed in an order that is pleasing to one’s sense of 
design and proportion. The classified heads are also put in the 
order in which the factual story is to be finally told. This is an 
integrative and coordinating process which consolidates the 
material into an organic whole. A coordinated, continuous chain 
of events is developed frequently against a timeline if the period 
covered is a long one. This outline of facts (with reference to 
authority) is the frame of the text of the research report. An 
additional task is that of interpretation of facts at. this stage; 
because the facts themselves merely provide the chronicle of 
events, a mere temporal sequence. In _ exegetic exposition 
we raise chronicle to the status of history. The historian 
must show an understanding of the sequence of events 
and should offer ‘significant explanation? of Benedetto Croce’s 
recommendation, where necessary. Facts in history show 
a lateral relationship to other contemporary facts, e.g. a 
cultural phenomenon may be laterally related to political 
or economic phases. A vertical relationship of preceding facts 
with succeeding ones along the timeline is also possible and 
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could supply another aspect of the ‘significant explanations’ 
we are seeking. 

Interpretation of historical fact as duly established by scholarship 
and research bristles with many difficulties. ‘The main charge, 
as has already been pointed out, is that of subjectivity. In spite 
of underlying philosophical conflicts and personal prejudice a 
‘common historical consciousness’ should emerge if the correct 
historical method is applied. Obviously historical objectivity is 
not the same as that of the scientist on account of the need of the 
historian to apprehend’ his form of truth from within by an act 
of intuition. Were he to adopt the detachment of the scientist 
he would become a mere chronicler and miss all the ‘significance’ 
of history. He has to achieve a truthful reconstruction of the 
past in all its complex relationships and tendencies by imaginative 
and yet logically valid means. This work partakes of the nature 
of the creativity of the artist without sacrificing any of the loyalty 
to truth. To do this the historian needs not the glass eye of the 
camera which may serve the turn of the scientist but the living 
human eye full of light and moisture. “The perspective theory’ 
suggests that the differences among historians are largely differ- 
ences of points of view and that if many points of view are inte- 
grated a complete picture will emerge. A research worker will 
be usually treading a wire which divides chronicle from history 
or objective skeleton of fact from the living reality, in attempting 
an interpretation of history. | 


The Research Report 


With this sketch of the text in hand the task of presenting all 
this in a suitable form is faced. This is a matter of expositional 
strategy and is of great importance in history, for the historian 
is by convention a gifted exponent.of an elegant style. Thucydides 
complained that veracity deprived his style of colour and this 
is a famous dilemma of the historian. The stress has to be on 
veracity of course. The colourful presentation should not falsify 
facts; it should rather make them vivid. In any case a judicious 
restraint with scholarly elegance and occasional eloquence will 
be in order. Perspicacity, clarity, precision and economy are 
regarded as eminently desirable in the style of a research report, 
for the expression is regarded primarily as a medium of trans- 
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mission of ideas. If it runs the risk, by reason of extrinsic mere- 
triciousness, of misdelivering a purport or message then it 
shall have failed in its primary purpose. Therefore the research 
report in history has to strike a mean between bald narrative and 
magniloquence; frugality must temper fullness of expression. 
Illustrative material like pictures, sketches, diagrams, tables are 
also to be considered in the course of the exposition. The golden 
rule about these is that wherever they materially improve and 
facilitate understanding on the part of the reader they may be 
incorporated. As mere ornamentation they have no place in a 
research text. ‘The style should be impersonal as far as possible, i.e. 
the first person singular, ‘I’ should be eschewed as far as possible. 
The research worker reports his findings by standing apart from 
them lest his personal attitude may let slip a personal opinion as 
a confirmed fact. Fact in history has many degrees of credibility 
and in the text this should be always clearly indicated. Wishful 
writing can convert a plausible fact into an established and proven 
thing and this tendency has to be guarded against. 

The parts of most types of research reports are identical. These | 
normally include a title page, a contents table, sometimes a 
foreward and a page of acknowledgements. These preliminaries 
are followed by the main body of the text which includes chapters 
on introduction, survey and sources of data and methods of its 
collection, organization of heads of classification and ordering of 
data, the analysis and interpretation of data, the conclusion and 
generalization and a summary. List of references, appendices and 
index wind up the textual labours. References in the body of 
the text are dealt with either in the footnotes or all together at the 
end in the listed references. It is largely a matter of editorial 
convenience but normally it is advisable to deal with references 
summarily at the end and not clutter the text with footnotes. On - 
the other hand if references are pointers to quotations which 
must be reproduced then footnotes rather than appendices should 
be used to furnish them. In any case competent adivce should 
be sought to evolve a uniform policy which should thereafter be 
consistently followed without exception to avoid confusion. It is 
a good idea, if the referencing is complicated and symbols and 
abbreviations are used, to include among the preliminary pages a 
page elucidating the mode of referencing. The format and get-up 
are a minor matter which contribute to the aesthetic value of 
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the report. For typescript presentations such as are made for 
university degrees the standard thesis size 9 by 12° should be 
used. 


B. THE PHILOSOPHICAL RESEARCH IN EDUCATION 
Philosophy and Education 


The term philosophy of education is regarded as somewhat 
presumptuous by the more conservative type of person. It is 
felt that education is a pedestrian, practical activity whereas philo- 
sophy is concerned with ultimate and fundamental issues of life, 
death and the nature and existence of the Godhead, and at least 
a direct and immediate relation between the two is not apparent 
to the naked eye. As against this some of the most outstanding 
thinkers—from Plato to Dewey—have found education a prime 
concern and have been inclined to take it rather seriously. The 
reason is that if philosophy in its last analysis is a way of life, 
education is the method of inculcating that way of life in others. 
And even the details of that method are determined by one’s theory 
of knowledge, concept of reality and the mind of man. The 
ends of human life and the means of attaining those ends are 
both important problems of philosophy. Therefore no serious 
harm is done to philosophy if it is brought down from the high 
empyrean to consort with an everyday concern like education. 
Since the child is the father of the man it is but right that we 
approach him with some philosophy at our disposal. 

In modern times a good deal is being written regarding the 
philosophy that should guide and mould our educational practices 
and objectives. Much able thinking has been directed towards 
these questions. Even so when one speaks of research in philosophy 
of education one is not easily understood. When so much discursive 
and systematic thinking has been done on education the need for 
analysing and systematizing it would arise. This is one business 
of philosophical research in education. Apart from famous 
educationists such as Comenius, Pestalozzi or Froebel very few 
persons have stated their philosophy of education formally in 
the manner of Plato and Dewey. The obiter dicta type of litera- 
ture however is plentiful and needs collation and coordination, 
and research could quite fruitfully undertake such tasks. A person 
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like Gandhiji was interested in education and gave much serious 
thought to the subject in its national context and from the stand- 
point of his own philosophy of life. It could even be that he had 
a systematic philosophy of education which he could never formally 
state. Or, at any rate, he had some ideas on education which 
reflected his peculiar genius and view of life. All this could be 
equally true of a writer like Tagore and Tulsidas. It could be 
that a group of persons could be homogeneous enough to be 
regarded as representing a particular view of life and education. 
To make a careful study of the writings of all such persons and 
extract from every source relevant material and present it all 
well organized as a formal statement of the educational ideas, if 
not philosophy, of the person or persons, is the kind of activity 
which research in educational philosophy undertakes. It could 
even be postulated that educational principles and practices of a 
particular period are dominated by a particular philosophy which 
needs to be abstracted and stated formally and directly. It is, 
for example, quite possible to speak of the philosophy of Later 
Vedic or Buddhist education if it can be shown that the practices. 
were based on a systematic view of life. An analytical and critical 
appraisal of the philosophy of an educational thinker can also be 
classed as research. His ideas in such a critique are not merely 
opposed and found fault with but they are examined closely in 
relation to his other works, traced to their origins and collated 
with those of parallel thinkers. Modifications and discrepancies 
if any are noted and explained and an attempt is made to under- 
stand his inspiration and his message and to evaluate ni contri- 
bution to educational thought. 

Philosophical and historical types of research have many points in 
common and primarily they are both very erudite pursuits having 
mainly to do with the study of relevant literature, which is invari- 
ably quite considerable. Not infrequently the two types of 
approach interpenetrate or flourish within the same volume side 
by side, as they are complementary rather than exclusive. There 
are however certain differences between the two methods. In 
philosophical research our interest is primarily in ideas and the 
theory or principles underlying educational practice rather than 
in a particular past with its ‘ineradicable temporal locus’. The 
truth of history is always concrete and particular; philosophy 
seeks the principles of things which constitute education. Philo- 
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sophical insight must of necessity generalize and seek out by prefer- 
ence the essence of things, the ideological frame of concrete fact. 


The Research Task in Philosophy 


It should be admitted that philosophical research does not 
require any very elaborate technique. It is largely a matter of 
erudition and analytical insight and synthesizing ability. It is a 
bookish business, even more so than history, having to do largely 
with the close perusal of authoritative books and venerable tomes. 
It needs naturally a good knowledge of philosophical thought in 
general and familiarity with the major schools and trends. What 
it needs more than anything else is the gift of acumen, of pene- 
trating insight and an analytical faculty which enables the re- 
searchers to smelt the rich vein of pertinent thought concealed in 
the scattered source material and separate it from the dross of 
irrelevancy. Also in the overview of a period the general trends, 
movements and governing principles are observed and traced by 
the exercise of this faculty of insight which enables us to see 
the wood inspite of the trees. This ability is directed with equal 
profit to the study of private lives and achievements of persons 
as much as to the evocation of the zezigeist of a period or era, 
remarkable for any particular educational achievement. 

This insight and analytical faculty helps greatly in the inter- 
pretation of the data that is secured. The task of interpretation 
is chiefly that of ascribing a significance, meaning, purpose, and 
relatedness to a common end, to an apparently heterogeneous 
mass of data. Philosophy offers more freedom to the worker than 
even history in this area and the charge of subjectivity is a good 
deal more justified. Tendentious writing is quite common and 
seems to reveal as much of the writer as of his subject. The research 
worker has therefore to be very much on his guard against using 
the theme as a pulpit for his own sermons. ‘No man can walk 
forth save upon his own shadow’, said Walter Raleigh and we 
are, as thoughtful men, doomed to judge others from our own 
points of view. A little conscious effort will however keep one 
detached from the subject so that his or her life and work (or of 
the period one studies) is described and judged objectively. The 
task of reading a significance and meaning into the data is a form 
of creativity which requires a touch of genius. If history clothes 
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the dry bones of facts with the flesh and blood of a vanished but 
once vital reality, philosophy must breathe life and spirit into 
this resurrected apparition. This amounts to capturing the 
moral and spiritual heart-beat of a generation, or of a personality 
long dead. Deep tact and sympathy are necessary for this with 


a minimum of that egotism from which subtle and undetectable 
falsifications arise. 


The Theme and Treatment 


The stages of work are not different from those involved in 
historical research. There are two major types of studies, as was 
earlier indicated. One is concerned with ascribing a system of 
thought or certain dominating concepts and master ideas to a 
nation, people, community or other Sroup at a given time and 
demonstrating them as reflected in their life, achievements, practi- 
ces and literature. The other gives us a critical appraisal of the 
thought of a great personage or group of persons and demonstrates 
it in his or her or their utterance, private or public, life and achieve- 
ments. To look for a suitable topic of either kind one must have 
a wide background of relevant study. As ‘in historical. research. 
topics easily suggest themselves to those who are deep in the subject. 
Philosophical research is however free of a limitation—and a 
very crippling one—from which historical research suffers. In 
philosophy it is not necessary to foresee, in a state of partial if 
not also considerable ignorance, the possibility of a philosophy 
depending on the availability of reliable source material. If 
there is a record of the period it can be studied for the discovery 
of a ruling concept and philosophy. Human beings, except in 
the lowest state comparable to that of animals, cannot live without 
a reference frame of ideas which can be designated as their philo- 
sophy or view of life. Wherever human life exists some conceptual 
framework of that life must also exist and is possibly capable of 
theoretic abstraction and evaluation. This is not true of history, 
where research js entirely dependent on the existence of pertinent 
traces. As against this the tie-up of philosophical research with 
concrete detail of data is looser and permits free play of inter- 
pretative genuis which ranges from sheer vagary to controlled 
and judicious insight. This takes away something from the 
exactitude of research and reduces its scientific correctness. The 
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historian has a ready criterion of correctness in his evidence of 
source data. A pervasive element like philosophy offers endless 
scope for research. It is coexistent with the fact of human conduct 
and life. Using the term in a peculiar sense people speak 
of the philosophy of science or history or, as an extreme Case, even 
of married life and cooking. Philosophy in such usage means 
the related theory which arises from and in return governs a given 
practice. Even though the absurdity of such nomenclature is 
patent it is true that theoretic constructs and a consideration of 
why and wherefore (which are primarily a form of philosophizing) 
can be conceived of and produced for any human activity. There- 
fore philosophical research themes are comparatively easy to come 
by. While you can philosophize on anything under the sun all 
of such lucubrations do not necessarily constitute research. Its 
very pervasiveness makes the selection of a research theme in 
philosophy something of a hazard that you may come out with 
a subject that is too jejune or trivial in import. Therefore unless 
a theme is capable of yielding a system of thought its investigation 
will yield nothing more than a discursive conglomeration of ideas, 

not worthy of a researcher’s attention. For example, if one pro- 
posed to propound the philosophy of education during the mediae- 
val period of Indian history or the philosophy of education in the 
works of the saint poets of Hindi the question may be raised 
whether we are not looking for the proverbial black cat which is not 
in the dark room. ‘Themes like educational psychology in the 
poetry of Surdas or the psychology of Patanjali are to be regarded 
as a variant on philosophical study and should be included under 
this ‘type of research. This will go to show how risky it is to 
propose a research on the presumption that the subject holds 
within its matrix the matter of erudite research. 

Apart from seeking competent advice the researcher 1s advised 
to examine a preliminary sample of literature and judge for himself 
the suitability of the material for the proposed research. One may 
wade through a whole epic like the Ramayana, the Mahabharata or 
the Shah Namah and not find enough material to rig up even a 
scaffolding for a proper philosophy of education even on the basis 
of such indirect evidence as exists. In most cases the writer 
or writers will never write a single sentence directly about educa- 
tion except accidentally. It is from the drama of the story or 
tangential consequences of utterance on other matters that a 
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notion of educational concepts is to be formed and this kind of 
inference meets the challenge of ascribing falsely to the author’s 
views which his dramatis persone were meant to uphold in the 
action of the story. These examples illustrate mistakes to which 
most philosophical type of research in education and psychology 
is prone, viz. that of starting off on a false scent. Probably 
nowhere is the danger greater. 

. For the rest such research differs little from the historical. 
There are the usual stages of classification and collection of data 
from original sources ; its analysis and careful sifting ; its organi- 
zation into a meaningful whole so that an intelligent circle of ideas 
is exhibited with neatness and economy : followed by elucidatory 
and evaluatory comment and a final summary statement. What 
has been said regarding the write-up and presentation of histori- 
cal research applies here also and the sequence of chapters is 
similar in the majority of cases. From the point of view of me- 
thod, both historical and philosophical researches are in approach 
and spirit similar, except that the one is reconstructive of the past 
in effect and the other has mainly an exegetic purpose in view. 
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CHAPTER MII 


THE STATISTICAL MELH@G. OF 
RESEARCH IN EDUCATION AND 
PSYCHOLOGY: STATIS¢{ iGae 
ANALYSIS AND SCALING 


The Scientific Approach 


Tue methods of research so far considered are of a class which 
may be called erudite and exegetic and are grounded mainly 
in the perusal of appropriate literature. Their fields of opera- 
tions are chiefly libraries and sometimes museums and less fre- 
quently archaeological sites and edifices. The field of operations 
of research in the sciences is by contrast the laboratory, and the 
forces and elements of nature are subjected to experimental exa- 
mination there. Psychology which as an academic study aspires 
to the status of a science makes frequent use of this kind of set 
up for its investigations. The social sciences among which educa- 
tion may be listed carry out ‘field work’ in its natural social setting 
using a minimum of experimental control. The scientific method 
is distinguished from the methods of history and philosophy by 
its extreme objectivity and the rigour of its regimen. Just as his- 
torical research is illumined and enlivened by its humanism and 
intuitive understanding, scientific research is noteworthy for its 
impersonality and detachment. Insight into the human heart 
and head is replaced by insight into the designs of nature con- 
cealed in the mass of data. The scientist seeks to uncover the 
secrets of nature by finding out the principles of things including 
human and animal behaviour. He uses typical samples of data 
and by careful control and measurement evolves general proposi- 
tions which will hold true for the entire mass of that type of data. 

Although the researcher using scientific methods is largely 
wedded to this kind of objective and approach his interest centres 
round three nuclei in the scheme of things, and methods have 
been classified according to this varying emphasis of interest into 
clinical, experimental and those pertaining to univariate and 
multivariate statistical analysis of individual differences. The 
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distinguishing shift of emphasis in these is of the following nature: 
in the clinical method a global picture of datum or data is obtained 


for a complete understanding of a particular condition of the 


‘case’ or ‘cases’. As against this, in the experimental method the 
paraphernalia of the laboratory is used to control conditions and 
use various factors in such a manner that the effect of the factor 


- of interest to the experimenter is high-lighted. And finally, in 


~ 


the method of individual differences the interest is in the variable, 
its statistical characteristics which promote generalization, and 
its points of contact and interdependency with other variables; 
a variable being any observed attribute which varies from person 
to person such as sex, or age or amount of ability. 


I TYPES OF MEASURES 


All these major types of methods in scientifically oriented 
research use and depend on measurement and classification, the 
nature of which is of importance to the research worker in these 
fields. Measurement is a matter of progressive refinement. At 
the crudest end we have the nominal type which classifies objects 
on the basis of a single attribute into mutually exclusive categories. 
It is by the use of this kind of estimation of value that a green- 
grocer assigns vegetables of large and small sizes to two groups. 
This type of scale implies only the determination of equality in 
a very rough sort of way as the equality is not yet measured in terms 
of any unit that being in itself a superior kind of measurement. 
We can use this form of measurement for such statistical purposes 
as require only a count of cases in mutually exclusive categories. 
In other words, it permits only the enumerative type of statistics 
using frequencies and classes as the available data. 

Ranking procedures improve upon this crude type of scale by 
allowing inter-person or inter-item comparisons and judgements of 
greater and less, over and above the judgements of equality which 
characterize the nominal scale. This is the ordinal type of scale 
which is frequently met with in education and psychology. This 
is based on a ‘monotonic’ increasing function which implies incre- 
ments by unit value from individual to individual placed in an 
increasing or decreasing order regardless of amount or degree 
of inter-personal difference. Percentiles which give the relative 
standing of individuals are the permissive statistics to be used with 
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this type of measure. A drill sergeant will frequently put his ranks 
in a decreasing height order and uses this type of process in doing 
sO. 

Interval scales which are next in order of refinement are capable 
of determining equality of intervals between measured objects, 
i.e. a measure of inter-person or inter-item difference is available. 
With the invention of such scales we enter the area of measurement 
proper as it is commonly understood. We are here in a position 
to say that A is twice as heavier than B as B is heavier than C. 
This relationship means algebraically that 


if 7p 
then A—B 


A 
2X 


| 


| 


on a scale we could show these relations as 


C B : A 


lower higher 
end : end 


the distance B A being twice as much as distance C' B. We could 
in ordinary speech say that X is as superior to Y as Y is superior 
to < in respect of some attribute. These inter-personal distances 
along a scale are given by the interval scale. It may be noted 
that if A, B, C are given arbitrary values of 1, 2, 4 and then a cons- 
tant 5 is added or a multiplier, say 2, is used the relationships 
as to relative differences in magnitude will not change. For 
example adding 5 all over we get 


B49 
or multiplying by 2 we get 
2. 


The-general truth that the difference between A B is twice the 
distance between B C holds in either case except that in the multi- 
plying situation this difference is also multiplied increasing from 
1 to 2 and 2 to 4. This process therefore belongs to the general 
linear’ relationship class which is expressed in algebra by the 
equation, 

x =mx-+e ; 
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where x is the transformed scale value, x the value we may start 
with, m any multiplier and c any additive constant. The appli- 
cation of both these leaves the inter-personal difference un- 
changed. ‘The presence of c, an additive constant in the equation 
should be carefully noted. What it really means is that it does 
not matter where we begin on a scale so long as the persons occu- 
pying relative positions on the scale remain stationary but for a 
proportionate increase or decrease in the size of all differences due 
to multiplication right through by an integral or a fraction. In 
essence this means that we do not have a fixed starting or zero 
point for the scale, which is a handicap. In physical science we 
come across extensive scales like those for weight and distance or 
area or volume which differ significantly in regard to the notion 
of a zero value from what are known as iutensive scales such as 
apply to temperature. Extensive scales are the complete scales 
providing a zero point (where the thing has no weight, no exten- 
sion) and equal intervals (which give measurable inter-personal 
differences). Intensive scales do not possess a meaningful zero, 
e.g. the zero readings of temperature scales are entirely conventio- 
nal. Psychological entities like mental ability or attitudes of 
approval and disapproval also lack a meaningful zero and have 
to be treated like intensive scales of science. Statistically the inter- 
val scales permit calculations of parametric statistics like mean, 
standard deviation and the Pearson coefficient of correlation 
which is based on these latter. 

The model of the extensive scales is the ratio scale which pro- 
vides equality of ratios on the strength of a fixed starting point. 
Starting from this point and taking equal intervals (as in the inter- 
val scales) we can regard two values as having a ratio relationship 
of a fixed nature. The equation expressing this type of relation- 
ship belongs to the similarity group and is 


naif x = Inx 


For example the value 12 is 4 times the value 3. This relation 
holds, if we are not free to move the zero on the scale of measures, 
Thus if a constant, c=5 is added to the two values the ratio rela- 
tionship no longer holds for 12+5=17 and 3+5=8, the value 17 
not being 4 times the value 8. Ratio scales are a rare type of scale 
in education and psychology and permit logarithmic transforma- 
tions of scale without any change in inter-personal relationships. 
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Mels (pitch scales) and Sones (loudness scales) due to Prof. 
S. S. Stevens are examples of such scales in psychology. Test 
scores with which the student of education is frequently concerned 
are of the interval class to which the mass of statistical technique 
applies. It may be noted in passing that most ordinary statisti- 
cal work uses the mean of a sample as an arbitrary zero point © 
_ for its procedures, tables and monographs. Statistical manipula- 
tion is capable of transforming one kind of scale data, en the basis 
of certain assumptions (which should always be considered), into 
a shape permitting the application of statistical methods not appli- 
cable to the original data. For example a researcher may start 
with ranks and then transform them on the basis of certain assump- 
tions into interval types of values by a simple formula and then 
proceed to calculate mean, standard deviation and correlation. 
The rating type of material which is really of the nominal class is 
similarly transformed. ‘This kind of jugglery does not improve 
the quality of the original scale and is permissible only if the data 
meets the assumptions in each case. 

Measurement as a technique, in whatever form, is utilized by 
all the three scientific methods of research in education and psycho- 
logy, viz. the clinical, the experimental and those of individual 
differences. It is the manifold and varied use of all forms of 
measurement by the methods of individual differences that has 
led to a phenomenal development of statistical rationale and 
formulae in the present century. Although statistical analysis 
plays a significant, if minor, role in the other two methods, it is 
exemplified in its more elaborate and elegant techniques in the 
themes concerned with individual differences in measurable or 
classifiable attributes. In fact the two are so closely allied that 
it is possible and probably more appropriate to regard the methods 
of individual differences and those of statistical analysis as signi- 
fying largely the same thing. The reason for this alliance is that, 
the experimental and clinical methods deal normally with a small 
number of cases, whereas individual differences always imply 
large numbers. Since statistics is primarily a science which studies 
variation in large numbers of cases it finds the greatest scope for 
application in the field of individual differences. ‘The researcher 
in education is likely to be concerned with fairly large groups 
which would favour the use of individual differences approach 
by statistical means. Statistics of a minor but crucial kind figure 
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in the other two methods as well in the form of the small sample 
theory as we shall see later. 


II ELEMENTARY STATISTICAL CONCEPTS 


Statistics which has in the present century made such telling 
contribution to the methodology of social sciences is now a pre- 
requisite of most scientific research. The reason for this is that 
observations which are the data of science are subject to errors 
which must be eliminated. Errors are of two kinds: constant 
or systematic due to defective methods of observation and record- 
ing which vitiate the data by introducing different degrees of 
bias; and random which are purely accidental and without any 
fixed directional tendency. Statistics provides estimates of the 
random error and since in the long run they tend to cancel out, 
makes it possible to reduce them. Apart form rendering this 
common service, statistics is used by the experimental worker 
to test crucial hypotheses and the clinical worker needs it to judge 
individuals as deviants from a group norm. Modern research 
methods in education and psychology simply cannot be under- 
stood without a clear grasp of the essentials of elementary and 
more or less non-technical, common sense statistics. The 
following excursus is devoted to a preliminary clearing of the 
decks in this context. 

Statistics is the science of group phenomena. For statistics to. 
emerge we must work with large numbers of observations. As 
an instrumental science par excellence it studies by means of logic 
and elementary mathematics variation from individual to individual 
of either sharply differentiated attributes like sex or nationality or 
quantitative characteristic like height and age, which are infinitely 
divisible into as fine gradations as one may want. Each variable 
whether quantitative or qualitative presents on the one hand the 
scale or categories and on the other the persons or observations which 
occur as frequencies in different categories or at successive points 
of the measuring scale. These distributions (of observations over 
qualitative classes or over equal intervals on the scale of measure- 
ment) are studied for central tendency and dispersion. The 
central tendency generally represents the group in its totality by 
a single central value. This could be the median or the mean 
or, for categorical data, the category of highest frequency known 


44 EDUCATIONAL AND PSYCHOLOGICAL RESEARCH 


as the mode. Dzispersion is the size of inter-individual differences. 
Dispersion is largely proportionate to the size of such differences. 
Distributions are frequently represented graphically by figures 
such as the one shown below. 


FREQUENCY 


SCALE OF MEASUREMENT 
CURVE OF A DISTRIBUTION 


Tabulated data yield polygon figures with angularities which can 
be smoothed out to give a flowing curve. The height of the curve 
at any point of the scale indicates the frequency or number of 
persons obtaining that particular value. From this figure other 
characteristics of distributions of interest to the researcher become 
apparent. There will be the question of the symmetry of the curve. 
There will also be the question of the height of the curve relative to 
the width of its base. The researcher will also be concerned with 
the curve having a single peak or hump or having a wavy trend 
with two humps. Finally if for each case a pair of observations 
are possible or alternatively if there are pairs of observations such 
as for husband and wife, father and son, siblings, twins, a bivariate 
distribution is available and the researcher could study the correla- 
tion between the two members of the pairs. If both variables in- 
crease and decrease together a maximum correlation expressed 
as r=1 is obtained. If they increase and decrease contrary-wise, a 
negative correlation of r= —1 will result. Forno relation r=0 with 
various smaller than unity values representing the intermediate 
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degrees of direct or inverse relationship. The concept of the 
percentile is concerned with the relative position of an _ observa- 
tion in a group. Instead of giving the exact rank of the case he is 
spoken of as occupying a position among a hundred ranked persons, 
whatever the actual number of observations in the group. Thus 
to say that X has 25th percentile position means that he has 25 
per cent of the persons in his group below him. Similarly 50th 
percentile is the value obtained by the middle case when all obser- 
vations are ordered from the lowest to the highest. Finally statis- 
ticians use the concept of the normal probability distribution. Every 
distribution has a mean and a standard deviation (not given in the 
data but calculated therefrom). Each observation can be re-ex- 
pressed by the following equation 


where X; = any observation of case designated i 
M, = Mean of the distribution of variable X 

0, = Standard deviation (a measure of dispersion) of 
the same distribution. 


Xj 18 a standard measure transformation of X;. If we re-express all 
observations given to us in various quantities made up of any 
arbitrary units like inches or centimeters or feet we shall obtain 
a standard measure form of the original distribution and, whatever 
the arbitrary original unit, e.g. inches or centimetres, the equiva- 
lent standard values will be exactly the same. The mean of such 
values will be zero, roughly half the observations carrying now 
negative signs, and the standard deviation will be unity. This 
occurs automatically as a result of the equation employed for the 
transformation. The concept of standard measure is basic to 
all fruitful statistical work. It enables us to refer any obtained 
distribution to the Normal Probability Table, to secure a measure 
of its departure from the ideal shape in which (this curve 
is conceived. The normal curve is also spoken of as the Gaussian 
curve so named after a statistician Gauss who established its usage 
in the 19th century. As its name suggests the normal curve is an 
ideal model which is approximated by all obtained distributions 
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of natural attributes and characteristics. The normal curve is 
described by the equation 


—? 
Y= ee (9 7192 
2.5070 


where WV = number of measurements 
Y = frequency 
o@ = standard deviation of the distribution 
x = X—M,. 


For given values of x(=X—M ) the frequency Y in a perfect form 
of distribution can be calculated. Normal probability table 
which is the first table to be included in any statistical text should 
be studiéd and understood. It serves chiefly three purposes: 
firstly, to check the divergence of an obtained distribution from 
its normal form; secondly, to transform a given non-normal 
distribution into a parallel normal shape, should such a 
transformation seem necessary; and, lastly, to state frequencies for 
given scale values and scale values for given frequencies on the 
assumption of normality. 

Correlation and association has many forms. There is the 
familiar Pearson r. Rho (p) is calculated by a simple formula 
for paired ranks and is of use in the ordinal type of data.! If 
categories of the nominal type are the form in which paired fre- 
quencies occur (e.g. three colours of ball are associated variously 
with three sizes) the contingency co-efficient indicates the degree of 
association of the sharply distinguished attributes. This is a corre- 
lational technique for qualitative data which does not show grada- 
tion of quantity. The diserial type of correlation gives the rela- 
tionship between a variable that is dichotomous (i.e. has only 
two categories) and another which is quantitative and continuous. 
These are forms of the same kind of relationship each adapted 
to the nature of the correlating variables. There are other appli- 
cations of the principle which provide for more distinctive situa- 
tions. Rp 4.3 stands for the multiple correlation of three quantitative 
variables 1,2 and 3 witha single one, 0. There is a system of weight- 
ing (or applying different multipliers to each of the three) which 


1M G. Kendall’s Rank Correlation Methods, 1955, should be consulted for recent 
advances in this field. 
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is known as regression weights and is capable of maximizing the 
R 0.123: When three variables a, b, c are inter-correlated partial 
correlation of any two can be calculated so that the effect of the 
third is held constant. If a represents scholastic merit, d, age and 
c, size of shoe, it can well be that a and ¢ are correlated entirely 
due to the effect of age on both. Then if age is held constant 
the 7,, will fall nearly to zero. 

Each of these varied types of statistic is derived from a sample. 
If a second sample is taken by a strictly random process the value 
of the statistic is bound to change. This means that we cannot 
take the statistic derived from any single sample simply on trust 
as wholly correct and indicative of the true nature of things. If 
we take the trouble to draw a large number of samples and calcu- 
late any particular statistic over and over again for each new inde- 
pendent sample we will get a small distribution of these within 
a short range. This is called a sampling distribution of the statistic. 
In view of this phenomenon it is not safe to accept an obtained 
statistic at its face value. It is clear now that each statistic inclu- 
des within it a small component of error. Conceptually we could 
speak of the value obtained from a single sample being 


Sy = Spe 
where S, = the value of the statistic obtained from a single 
sample 
S, = the true, error-less value of the statistic 
€ = the error component of Sp. 


This in brief is the concept of random error in statistics. There- 
fore it is customary to state the liability to error every time a statis- 
tic is calculated and quoted. The measure of error. of statistics 
is known as the standard error (the somewhat outmoded probable 
error, PE being only about % of this every time) and is to be calcu- 
lated for each statistic as an indicator of its reliability. 

All the statistics so far described assume the interval type of 
measurement and are known as parametric. When observations 
are only of the nominal and ordinal type simpler statistics with 
less restrictive assumptions are used and are designated non-para- 
metric statistics. As rough checks with small numbers of cases 
they are quite useful and are therefore considered in some detail 
later. 
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This short survey of common statistical concepts has been given 
to familiarize the student with their general nature and purpose 
and to introduce terms which occur frequently in the course of 
discussion of research techniques. The student should acquire 
an elementary working knowledge of these statistics so that he 
has a clear understanding of the terms. 'The books recommended 
for this purpose are Ferguson’s Statistical Analysis in Psychology and 
Education (Ferguson 1959) and Guilford’s Fundamental Statistics 
in Psychology and Education (Guilford 1950). 

With this very sketchy background of a mere theoretic under- 
standing of necessary statistics we are now in a position, if not 
just barely in a position, to enter upon a consideration of the major 
techniques of the method of individual differences. The term 
method. as used in this text includes the general approach, spirit 
and strategy of attack on a problem whereas the term technique 
is limited to the detailed procedure of carrying a specific technical 
job through its various stages to a finish. We have seen that 
the research situation could be multivariate when we are dealing 
with a number of variables at the same time. ‘The simpler situa- 
tion is where only one variable is subjected to analysis and examina- 
tion. Individual differences methodology takes many forms under 
this condition some of which, as typical of a whole class, are des- 
cribed in the following sections. 


III sAMPLING 

Sampling is fundamental to all statistical methodology of 
research. Bad sampling vitiates the data at the source and no 
amount of subsequent statistical fineness will improve its quality. 
Statistical error theory is concerned with random error but it 
cannot detect and separate spurious elements which enter into 
the configuration of the data at the source. It therefore behoves 
the scientific-minded researcher to take good care of sampling. 
In fact sampling is part of the strategy of research and has by now 
acquired the status of a technical job. Even simple random 
sampling is not made by just picking anybody who comes along 
but has to be safeguarded against inadvertent and unconscious 
bias-and systematic error. Experts have drawn up lists of ‘ran- 
dom numbers’ to make randomization easy. There are many 
other varieties of sampling techniques to serve specific purposes. 
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“Sampling design’ in fact means the joint procedures of selection 
and estimation. Sampling should be such that the error of esti- 
mation is minimum. 

Population or universe means the entire mass of observations which 
are the parent group from which a sample is to be formed. This 
may be finite as when we say ‘the eight-year olds’ of a particular 
geographical area, or it may be infinite. In both cases complete 
survey of the entire population is rarely practicable. The para- 
meter is the true, errorless population value; as against this the 
population value is based on the entire population but includes errors 
of measurement known as non-sampling errors. The sample 
gives only an estimate of the parameter. The effort of the investi- 
gator is directed to reduce both sampling and non-sampling errors. 
Good sampling will help in this. 

Non-sampling errors are of two kinds: 

(1) The variable response errors, which are errors of observa- 
tion and slips in processing. Their elimination increases accuracy ; 
they are defined as 

Imprecision=e—t 

where e¢ is the estimate and ¢ the true value. 

(2) The errors of bias, which are systematic, directed and non- 
cancelling. Bias is defined as 

Bias =E(e)—t 
where E (e) is the expected value of the estimate. Bias thus refers 
to the deviation of the expected value of the estimate from the true 


>| Sampling 
error 


Errors of Imprecision Bias 


ey 


Non-sampling error 


value and not a mere discrepancy between an estimate and the 


4 
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true value. Precision means a small variability among estimates, 
i.e. a reduction of the said discrepancies. We may get in a sample, 
(1) Imprecision with bias or, (2) Precision with bias or, (3) Impre- 
cision without bias or, finally, (4) Precision without serious bias— 
which is what will reduce both kinds of errors of sampling. A 
graphic representation of errors of sampling has been offered and 
is reproduced above. 


Sampling Methods: The Probability Methods 


(1) Simple random sampling: ‘This is the most popular, basic 
method which is frequently assumed and misapplied. In this 
every member has an equal and greater than O chance of being 
picked up. This picking may be done (a) with replacement so 
that a member can be selected more than once or (b) without 
replacement when each member can be selected only once. 

It is suggested that members should be replaced but not noted 
on second appearance (Kish 1953). It is risky to take randomiza- 
tion for granted. The use of random numbers is advisable. Most 
text books of statistics include their tables and indicate their use 
(Arkin and Colton 1950). 

The sampling fraction n/N for finite populations gives the pro- 
bability of selection of each member, n being the sample size and 
WN the size of the population. If a finite population is not large 
(e.g. post-graduate students of a college X or the women students 
offering biology in M.Sc. in a university) then the variance of the 
sample means needs to be ‘corrected for finite population’, and 
is given by 


: | 
1 ee 
(1—f) = where f = . and o” =» (X—X) ? 


This correction is applicable where sampling has been ‘without 
replacement’. The lesson of this is to keep an eye on the sampling 
ratio in small finite populations for what you would lose by impro- 
ving that ratio is made up by increase in the reliability of the larger 
sample. 

(2) Systematic sampling: For this the population has to be 
ordered serially. The method for this requires V/n to be computed 


STATISTICAL ANALYSIS AND SCALING 5] 


and rounded off to m, the nearest whole number, which is 
‘the sampling interval’, An integer is selected at random from 
1 to m. If this is designated x then the sample is formed by ele- 
ments definable as 


x, Xm, x+2m, x+3m......, «+ (n—1)m. 


If in the ordering of cases cyclic trend exists then the periodic 
beat of selection will pick up and absorb this cyclic trend and 
give wrong results. Also if the ordering is by size, beginning with 
a higher x will yield an inflated estimate. 

(3) Multistage sampling: In this type of sampling ‘the primary 
sampling units’ are inclusive groups, and ‘secondary sampling 
units’ are the sub-groups within these. Ultimate units to be 
selected are the elements, which must belong to one and only 
one group. Staging is like stratification and clustering, a prin- 
ciple of grouping in selection. : 

(4) Stratified sampling : Sampling is carried out with different 
sampling fractions n/N within each stratum or group. The 
Stratification is on any principle of relevance to the sample. 
There are three forms of this: In proportionate sampling per- 
centage of each stratum is taken according to its share in the 
population. 7 

In optimum allocation the stratified samples are proportionate to 
the size of the population within the stratum as also to its vari- 
ability. Finally the disproportionate stratified sample, of which equal 
percentage of draw regardless of size of the stratum js a special 
case and example. 

(5) Cluster sampling: This involves a complete count of 
selected intact groups. 

It is possible here to stratify clusters by selecting them from 
Strata into which they are sub-divided. Sometimes instead of 
the intact cluster being taken a random segment only is picked 
up. Sub-sampling is the name given to taking a random or syste- 
matic sample from within a cluster. 

Stratification and sub-sampling can give a very efficient sample 
in sociological research. The difference between multistage, 
stratified and cluster sampling has been brought out by the follow- 
ing tabular classification (Ackoff 1953), 
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Stage I Stage IT Sample Type 
Selecting groups Selecting elements 
Sample Sample Two stage random 
Complete count Sample Stratified 


Sample Complete count Cluster 


(6) Repetitive or Multiple Sampling: Double sampling; wherein 
one sample is analysed, and information obtained is used to draw 
the next sample to examine the problem further. 

Group sequential sampling; where samples of size K’ are drawn 
repeatedly and pooled until a decision is reached. 

Sequential sampling figures under this class and has been 
examined in detail in Chapter V. 


The Non- Bish 2s Methods 


(7)  Sfudgement Se iline: Probability les are designed to 
be representative. If for any reason in the researcher’s judgement 
a particular subgroup or stratum is satisfactorily representative 
its selection illustrates the principle of judgement sampling. Pilot 
studies could be based on such samples. They furnish the staple 
of what are known as ‘test cases’ or ‘typical cases’. But by and 
large the reliance on intuition and ‘hunch’ is risky. 

(8) Quota sampling: This combines judgement and probability 
procedures. The population is classified into several categories; 
on the basis of judgement or assumption or previous knowledge 
the proportion of population falling into each category is decided. 
Thereafter a quota of cases to be drawn from each is fixed and the 
observer allowed to sample as he likes. Quota sampling is very 
_ arbitrary and likely to figure in municipal surveys. 

These varieties of sampling designs show that the probies is 
not automatically solved for each specific enquiry. Good, 
» unexceptionable sampling design is bound to improve the quality 
of the entire research. Before launching on the campaign for 
data it is necessary to consider the question of sampling. Ackoff 
(1953) gives a good survey with an evaluation of the merits of 
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sampling methods. Kish (1953) also provides a very lucid 
discussion, and may be consulted with profit. 


IV TESTING OF HYPOTHESIS: THE t AND THE y2 
Postulate, Assumption, Hypothesis : 


The terms hypothesis, assumption and postulate occur frequently 
in research literature but are often confused by students and need 
a word of explanation. Postulates are the working beliefs of most 
scientific activity. The area of credence, of undisputed acceptance 
as true or as factual, within which the force of evidence and logic 
must work to produce further trustworthy results is what we mean 
by the term postulate. The mathematician begins by postulating 
a system of numbers which range from 0 to 9 and can permute 
and combine only thereafter. A man from Mars can object to 
ending the conventional series of numerals at 9 and can even 
propose a new tenth numeral for which we have as yet no single 
symbol, 10 being a mere extension of the old system. ‘The rules 
of chess are a set of postulates in the sense of being the primary 
conditions of the game. The non-interference of the Divinity in 
the results of an experiment is frequently an unspoken postulate 
of science. That one is not mad and must trust one’s experience 
must be a postulate of undertaking any research at all. In fact 
life is run on a set of principles we consider as imrplicit in the human 
situation. With many people God is a postulate of the good 
life or at least the godly life. ‘Postulates are not proven; they 
are simply accepted at their face value so that on their basis work 
for the discovery of other facts of nature can begin. 

Assumption means simply taking things for granted so that 
the situation is simplified for logical procedure. Assumptions are 
not the very ground of our activity as thle postulates are. They 
merely facilitate the progress of an argument through a partial 
simplification by introducing restrictive conditions. For example, 
by assuming that random errors are not correlated we can solve 
a number of equations in test theory which otherwise will defy 
solution. The Spearman-Brown Prophecy formula and _ the 
Kuder-Richardson Reliability formula 20 are based on a number 
of assumptions. If these assumptions are not met by the data 


the formula will not work. Assum tions mean restrictive conditions 
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before the argument can become valid. Assumptions are made 
on the basis of logical insight and can be verified in the data. The 
researcher may find the going hard because known facts and 
logical insights are few and unknown factors many. From his 
experience and acumen he may launch an assumption which 
means that a few of the more obstructive of the unknown or at any 
rate untested factors are brought within control and the argument 
is allowed to proceed. It will now be seen that postulates are 
the basis and form the original point of an. argument whereas 
assumptions aré a matter of choice and the less use we make 
of them the more free will our argument be as a general 
proposition. 

Hypothesis is different from both these. It is frequently spoken 
of in connection with experimental procedures. The word 
hypothesis is made up of two Greek roots which mean that it is 
some sort of a ‘sub- nt’; for, it is the presumptive statement 
of a proposition which the investigation seeks to prove. It has 
been said earlier that science seeks to describe phenomena by 
means of condensed generalizations. The generalization requires 
a knowledge of principles of things or those essential characters 
which pertain to entire classes of phenomena. The discovery of 
such principles or essential characters is the business of scientific 
research. Aimless groping is wasteful. Human insight enables 
man to generalize from his own experience howsoever inadequate. 
Popular wisdom consists of such insights into life. The scientist 
too observes the mass of a special class of phenomena and broods 
over it until by a flash of insight he perceives an order and an 
intelligent harmony in it. This is often referred to as an ‘expla- 
nation’ of the facts he has observed. He has a ‘theory’! about 
that particular mass of facts. ‘This theory when stated as a testable 
proposition formally and clearly and subjected to empirical or 
experimental verification is known as a hypothesis. The hypo- 
thesis furnishes the germinal basis of the whole investigation and 
remains to the end its corner stone, for the whole research is 
directed to test it out by facts. The entire value of a research 
depends on how good the original hypothesis with which one 
started is. It is in the light of the hypothesis that the relevance 


1 The term theory has a technical meaning of its own which does not concern 
us here. 
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of data to be collected is judged. Since the entire research is 
staked on the hypothesis it needs to be a good one with a high 
probability of coming out true, although the rejection of a hypo- 
thesis is also a positive result in an eliminative sense. At the start 
of an investigation the hypothesis is a stimulus to critical thought 
and offers insight into the confusion of phenomena. At the end 
it comes to prominence as the proposition to be accepted or rejec- 
ted in the light of the findings. In between these stages it 
furnishes the worker with the sign posts for the progress of the 
investigation. 


The Null Hypothesis 


Occam’s razor is the name given to a principle or canon of 
€conomy in scientific explanation which requires that for a given 
set ofobservations the fewest and simplest of generalizing principles 
or factors should be offered as against more complex and numerous 
ones. Statistics therefore has proposed a class of minimum hypo- 
thesis which should have the first opportunity to account for a set 
of observations. This minimum hypothesis is a negative hypothesis 
because it denies the existence of any systematic factors or principles 
apart from the effect of chance. This class of hypothesis is known 
as the Null Hypothesis so called because it ‘nullifies’ the positive 
argument of the findings. It is used to apply Occam’s razor to 
cut down the number of explanatory principles to a bare minimum. 
In effect it tries to explain away the observed trends, fluctuations 
and configuration of the data as due merely to chance. If varia- 
tions due to chance do not account for the peculiarities and vagaries 
of the data then we may accept the operation of other systematic 
factors on them. We can thus conceive of two kinds of hypothesis, 
a minimum hypothesis which tries to account for the observed 
nature of the phenomena by chance, and the other which a worker 
may choose by his insight into the nature of the phenomena. 

The t: The simplest application of the Null Hypothesis is 
against observed data collected or against the hunch of a rival 
hypothesis. which might be called non-Null. For example, one 
may suspect that the ‘space factor’ as a cognitive function is weak 
in girls. Possibly in one’s experience there has been some evidence 
of this. Alternatively we could just ask the simple question 
whether this function is equally good in the two sexes. The mean 
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scores for boys and girls on a ‘space factor’ test are the observed 
data. We have thus 


Neola No =100 


The standard error of difference between means is given by the 
formula : 


°M,—M, ne, OM, ay ou. 


O1 


Oe , 
where ou, ne and o i 4 /Ny WN being the num- 
ber of cases in the two groups. 


Applying the formula we find S.E. of difference between boy’s. 
and girl’s means to be 1.41, correct up to two decimal places. 
We next proceed to calculate the Critical Ratio, #1 of the obtained 
difference in means and this standard error of that difference. 


50 — 40 


——_—. 71 
1.4] 


If the ¢ is greater than 1.96 the difference is said to be significant 
at 5 per cent level of confidence and if it passes beyond 2.58 it is 
significant at 1 per cent level of confidence. This terminology 
simply means that the probability of obtaining such a size of 
difference for samples of the size used, by random sampling, is 
3 and 1 per cent respectively at C.R.=1.96 and 2.58. 

Standard errors of differences between obtained values of means 
of correlated variables, or proportions, percentages, frequencies, 
standard deviations and correlations are found by different 
formulae and used to obtain a ¢ value which is evaluated in the 
manner described above. The Null Hypothesis in these cases is 


1For small samples the ¢ Table instead of the Normal Probability Table is. 
used for the C.R. 
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that there is no difference between the obtained statistic of the 
pair of variables and that the observed difference is accidental 
and due to chance. When the difference becomes ‘significant’ 
by reason of the ¢ crossing the 1.96 and 2.58 values, the Null 
Hypothesis is ‘over-thrown’ and our conclusion is that a difference 
does exist, the amount of which cannot be, of course, specified. 
The Null Hypothesis is a negative hypothesis in the sense that 
it only negates the probability. of the true difference being zero. 
It makes no positive statement by itself. 

The Chi-square: Similarly if a frequency distribution over a 
number of categories is given such as: is 

Frequencies of responses on a five point scale to the statement: 
‘Music is essentially an idle pastime’: 


Agree Agree Indif- Disagree Disagree 
strongly sloghily ferent mildly strongly 


F 5 8 8 6 3 
J under Null Hypothesis 6 6 6 6 6 


This is again a case of checking the obtained data against 
the Null Hypothesis which states that actually the responses 
are divided equally over the five points of the rating scale. 
The divergence of any hypothesis from observation is measured 
by a statistic known as the Chi-square. The equation for 
Chi-square is 


ye = > eae) 


where & = Greek capital for s known as sigma indicating a summa- 
tion process over all the categories (in the above example, five), 
Jo=observed frequency and Je= frequency expected on the basis 
of Null Hypothesis. By this formula Chi-square for the category 
“Disagree strongly’ comes to 


(3—6)” 
6 


2 = wa 1S 


The Chi-square summated for all the six categories comes to 
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1.67--.67--.67+0+1.50=4.51. Textbooks of statistics include a 
table for y? which is entered with degrees of freedom which are 


n—] 


and in case of two variables shown together in a scatter diagram 
(m—1) (%—1) 

where n = the number of categories 

mn, and ny=the number of categories for variables 1 and 2. 
The obtained y? is compared with entries in the row. The 
column heads show P or probability of any given size of y” arising 
for the given number of categories by chance. For d.f. (degrees 
of freedom) 5—1 or 4, values between which 4.51 occurs are 
of the following kind: 


n | De 30 
4 | 3.357 4.878 


This is interpreted to mean that the chances are anywhere from 
30 to 50 per cent that yx” of 4.51 would arise by chance in 
categories of an order which gives d./f. of 4. 

Chi-square is a useful statistic in testing all departures of obtained 
data from their distributions expected under the operation of 
chance factors alone. If the data have been collected on the 
expectation that they possess systematic tendencies (as in the case 
of sex differences in ‘space factor’ mentioned earlier), the Chi- 
square test will show them to be within limits of chance fluctuation 
or beyond it. Chi-square values which lie in columns between 
P’s of .05 and .01 show ‘significant’? departures in a statistical 
sense from chance distributions at the usual levels of confidence. 
The bigger the Chi-square the less likely it is that the configuration 
of the data is due entirely to chance. 

The Chi-square can also be used to evaluate the departure of 
a distribution on a quantitative variable from normality. With 
the help of the normal probabilty table and using the obtained 
mean and standard deviation, frequencies expected in each interval 
on the presumption of normal distribution are found. The 
obtained frequencies and these expected ones then enable y” to 
be calculated. The conclusion can then be drawn, whether 
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or not the distribution as obtained has departed from normality 
significantly. All these are examples of tests of various hypotheses 
including the minimum statistical hypothesis known as the Null 
Hypothesis. If, therefore, the researcher has a theory which 
he wishes to test against objectively secured data or if he has 
obtained data the shape of which departs materially from chance 
distribution he would find the Chi-square a useful tool. A closer 
acquaintance with this yeoman statistic is recommended. Before 
leaving the topic it may be remarked that Chi-square is based 
on frequencies and not on the difference between two means or 
other parametric estimates read on a quantitative scale. Enthu- 
siasts have been known to use differences in scale values for Chi- 
Square test and this is not permissible. 


V ANALYSIS OF VARIANCE 


Consider a situation frequent enough in research where instead 
of one difference between two means, many differences between 
many means are compared. In a hypothetical case localities 
are rated into four classes from best to worst according to their 
social environment. A criterion test of reading is given to four 
random samples of 10 boys each from the localities. We would 
get now four means M,, M,;, Mo, Mp. If we test the ‘no 
difference’ hypothesis we must not be tempted to apply the Chi- 
Square because these are parametric estimates and not frequencies. 
The Critical Ratio can be applied but the comparisons are, pair- 
by-pair, 6. The formula for determining the possible number 
of pairs of n things is 


n(n— 1) 
1x2 


The intention in this case seems to be to establish the differences 
as real or not large enough to be significantly different from those 
produced by chance. One interesting result could be that 
whereas adjacent means (when ordered by size) show no significant 
difference, widely separated ones do. If the intention is to find 
out the significance of overall group differences then the labour 
of two-by-two comparisons may be curtailed and a summary 
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judgement obtained by the analysis of variance. Variance is 
the standard deviation of a distribution, squared. If 


5 (X¥.—M.)2 
o (standard deviation) = a 2X; M,) 
NV 
X(x;—M,)? 
then (variance) = sa ac 
N 
and S.S. (Sum of squares) = %(X;—MM,)? 
where & = _ the summation sign over all W values of X 
X; = each X value in turn 
M, = Mean of variable X 
N = Number of cases in the distribution. 


The following example will make the meaning clear : 


Value of X Deviation squared 

5 (5—4)7 > = | 
@ (3—4)% = | 
3 (1—4)? — — 32 — 9 
Oo: (9—4)? = 5. = 25 
e (2-4/7 2 2. = 4 

La ee 20 40 

pap ¢ 20 
M. Warsi: Gare ENT eS TSR S —s 
ON 5 : 

40 40 

oo eS = 2.83 and o* = Lie 8 which is the 


square of 2.83 except for the rounding off. For tabulated data 
there are other formulae which simplify computational work. 
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Since standard measures have standard deviations of unity, 
the variance of such measures is also unity. The formulae 
given above apply to gross measures in their original form. 

Variance has a quality which makes it specially useful. It 
has an additive property which the standard deviation with its 
square rooting does not possess. Variance on this account can 
be added up and broken down into component parts. 

The expression &(X,—M,)* gives what is known as ‘the sum 
of squares’, being the sum of deviations from the mean, squared. 
If our four means for our groups A, B,C, D are based on 
random samples of 5 persons each we get for a _ hypothetical 
example : 


Group 
A B C D 
i) l 8 12 
3 0 ° 14 
2 2 4 10 
6 Z 5 7 
4 0 4 - 
M,, 4 I 6 9 M,=5 


where M,,; is the group mean for each column and M, is the 
grand mean of all twenty cases. We observe two kinds of variances 
here. There is the variance of the members of each group around 
its group mean e.g. for column A the sum of squares is (5—4)? 
+ (3—4)? (2—4)?+ (6—4)?+ (4—4)?=10; and similarly for B, C 
and D. As against this there is variance of group means around 
the grand mean, the sum of squares for which is (4—5)+(1—-5)? 
-+(6—5)?4 (9—5)* =34. Simple inspection shows that group 
means vary considerably from each other and members of each 
column vary among themselves. There are two extreme forms 
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this table can assume, one where the variance within columns 
disappears as in the table below : 


A B G D 
4 l 6 9 
4 1 6 9 
4 l 6 9 
4 1 6 9 
4 l 6 9 
My 4 6 9 M,=5 


all variance being that of the group means around the grand 
mean. ‘The other form is where the variance of the group means 
disappears as in the table below: 


A B G D 
6 5 7 8 
4 4 8 10 
3 6 3 6 
7 6 4 1 
5 4 3 0 


M, 5 5 5 a as 


We have thus altogether three situations: 


(1) Where individuals within each group differ or vary and 
the group means also vary. 

(2) Where individuals within each group are alike, only group 
means vary. 
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(3) Where group means are alike and only individuals within 
each group vary. 


The total variance is the variance of the twenty individuals 
around the single grand mean. 

In situation (2) Total S.S.=S.S. of group means magnified 
nm times, n being the number of cases in each group. In 
situation (3) Total S.S.—Sum of within group S.S. for all 
groups. The total S.S. represents a random sample of the variance 
as it exists among members of the parent population from which 
the groups are drawn. In the above two situations we ascribe 
this entire variance either to differences within each group ruling 
out inter group mean differences or differences among group 
means suitably multiplied by n to magnify the variance to the 
sample size of 4x5=20. Symbolically 


/ 


So = S?. when > 28? = 0 

aes, = S? when 52> — 0 
where S? = total S.S., $2 = S.S. within groups and 
S? = variance between group means multiplied by n, the 


number of individuals in each group. 

The total variance is a sample of the population variance and 
in the two extreme cases where it is equal to S? or $2, these latter 
become as good estimates of the variance or inter-individual 
differences in the parent population. If the within-groups 
variance remains large, groups are random sub-samples of the 
total sample of 20 in the given case, and the hypothesis of syste- 
matic inter-group differences falls to the ground. The two sample 
estimates are therefore compared by a ratio of which the estimate 
via group means is the numerator and the estimate via the within 
groups method forms the denominator. Symbolically 


Figg Oe 
Sy 

The hypothesis is that S?, the variance within groups (5? = 0) 
as an estimate of the population variance is large enough to give 
a small value of F and reduce the probability that group means 
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differ significantly and indicate systematic differences in groups. 
The table for F is part of most statistical texts and shows, in its 
body against columns of degrees of freedom for greater and rows 
of degrees of freedom for smaller variance, two values of F, one 
for 1 per cent and the other for 5 per cent level of significance. 

The degrees of freedom are one less than the number of cases 
or number of groups. For example the d.f. for within variance 
in the present case is one less for each group i.e. 5—1 for each 
group; since there are 4 groups this came to 


4(5-1) = 16 
Similarly for the between groups variance the d.f. is 
4-1 = 3 


The sum of squares divided by these d.f. give the variance for the 
fF test. 


Source af. Sum of Squares Variance 
Within groups 16 124. 4 ilo 
Between groups 3 170 56.67 
Total iQ 2 294 

(= 20—1) 
. 56.67 A 
Se 


The table entries are: 
i> 
(greater variance d. f.) 


(3.20 5% ) 
fom ad level of confidence. 
5.18 1% J 


Since 7.31 exceeds both values of F we conclude that the differences 
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between groups as indicated by their means are systematic and 
substantial and not such as could be attributed to chance only. 

This type of analysis enables us to evaluate the differential 
effect of any factor or principle of grouping and to decide whether 
groups are significantly different or fairly homogeneous. 

Analysis of variance begins to pay off increasing dividends as 
we increase factors or principles of grouping. Such designs are 
known as ‘factorial designs’ and naturally involve increased 
computational labour. Going through an illustrative example in 
a text will make the steps clear. The examples worked by Guilford 
(1950) are recommended for the purpose. If there are r categories 
of one factor and & categories of the other then 


kXr = number of groups 


which could be randomly formed so that each category of one 
factor combines once with each category of the other. If k=3 
and r=4, then 12 sets are formed by pairing each category k with 
each category r. 

Working with one factor of grouping at a time we could now 
get three estimates of the population variance: 


Source d.f. 
Between rows (r) r—| 
Between columns (k) k—1 
Within sets (n) N—rk=rk(n—1) 
Total (NV) Nol 


where n is the number of cases in each set or groups. There is of 
course the total sum of squares with .V—1 degrees of freedom. 

There now appears a fourth source of variance and this is the 
S.S. (sum of squares) for interaction of factors r and k, with d. f. 
(r—1) (k—1). This is the S.S. for the means of rxk sets and 
is the difference between the total $.S. and the sum of 1, k, and n 
representing the S.S. for “between rows’, ‘between columns’ and 
‘within sets’. Symbolically using square sign for §.S.: T?— 
(R?+ K74+-W*)=I* where the letters represent the S.S. from 


5 
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sources: total, rows, columns, ‘within’ sets or ‘remnant’? and the 
interaction. We are now making four separate estimates of the 
population variance: (1) By way of the means of categories under 
the factor given by the rows. 

(2) By way of the means of categories under the factors given 
by the columns. 

(3) By way of the means of all rxk sets formed by crossing 
the r categories of one factor with k categories of the other. 

(4) By way of n cases in each of rxk sets which acts as the 
metric for the test of significance. 

There are three F tests 


fie (Se Ss 
hy =a = TA 


The significance is read as earlier from the F table the d.f. for J? 
being (r—1) (k—1); significant F’s show the factor concerned to 
be operative in causing differences among individuals which 
cannot be explained by chance. Significant interaction means 
that the factors are so correlated that when operating together they 
produce significant differences in the set means. 

The number of factors responsible for grouping can be increased 
at the cost of much greater computational labour. With three 
factors X, VY’, < we would get the following S.S. and variances 


and MW. 


In this more complicated case we again explain the total variance 
as being largely due to any single factor, a joint operation of any 
two of them and finally the joint operation of all three, all these 
effects being evaluated in terms of the within-sets variance, the 
number of sets being the multiplication of the numbers of 
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categories in each of the three factors. In a hypothetical situation 
we may have differences due to sex (2), social class (say 3) and 
age (say 4 levels), then the sets will be 2x3x4=24, 

As factors of grouping multiply the sources increase rapidly 
and calculations become prohibitive. Also the sources divide up 
significance in the most intricate patterns and the net result is 
less clear and satisfactory. The higher order interactions coming 
out as significant increase difficulty of interpretations and with 
increasing factors of classification the number of sets increases so 
fast as to make the availability of data extremely difficult. The 
research worker is however advised to hold the analysis of variance 
in high esteem. Fisher, one of the leading statisticians in the world 
today, has presented to us in this technique one of the most powerful 
tools of critical analysis of classified data. The assumption implied. 
in its application is that variances within the sets are of the normal 
shape and approximately equal. Since only a small number of 
cases go into each set it is not possible to check this condition 
empirically; the assumption may therefore be taken as applying to 
the nature of the population from which strictly random selection . 
of sub-samples of sets must be made. It may be noted at the 
end that basically analysis of variance is a simple or manifold 
analysis of relationship between factor of classification, which is 
not very different from the technique correlation analysis or the 
comparison of means. The objective of all three procedures is 
to test the degree of relationship between variables categorical 
er quantitative. For example, if the means of the two sexes on 
a test are significantly different, it means, in terms of analysis of 
variance, that the factor of sex produces significant variation in 
test scores between sex-groups, and in terms of the correlational 
technique that the relationship hetween sex and the test perform- 
ance is significant. The analysis of variance has a generalized 
application which makes it extremely useful to the research worker 
in psychology, education, medicine and agriculture. The student 
studying the simple and combined effects of various factors of 
classification up to, say 5 is advised to consider the possibilities 
of this technique in his experimental design, and thereafter study 
in detail the procedure outlined in the texts recommended. The 
varied and sometimes very advanced and complicated applications 
of the basic principle of analysis of variance are generally subsumed 
in statistical and research texts under the generic term ‘designs of 
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experiment’. We shall have occasion to consider these again 
under ‘the field experiment’ in Chapter V. 


VI PSYCHOPHYSICS AND SCALING 
Classical Psychophysics 


An academic study acquires the status of a scientific discipline 
to the extent it employs measurement techniques and experimental 
procedures. Psychophysics represents the initial push of psycho- 
logy in this direction. The classical psychophysics of Weber- 
Fechner has constituted for over a hundred and fifty years the 
standard method of experimentation in psychology, and it belongs 
contextually to Chapter VI. Itis taken up here because it precedes’ 
and contrasts with the modern psychophysics of Thurstone (1927) 
which leads to the subject of scaling. Psychophysics seeks to 
establish a parallelism between physical stimuli and their psycho- 
logical correlates. The psychophysical principle known as the 
Weber-Fechner law states that ‘sensations are proportionate to 
the logarithm of their exciting stimuli’. Algebraically this rela- 
tionship is expressed in the famous equation: 


SS == A log ke 
where S = sensation 
K = aconstant in the case 


i == Stimulus (Reiz). 


(In American usage S=stimulus and R=response or sensation.) 
The classical psychophysics dealt with the quantitative aspects of 
sensory phenomena (such as intensity, extensity and duration 
of sensations) and investigated problems of thresholds—absolute, 
where the sensation is just felt and differential, where one stimulus 
is perceived as just different and no more from another, and also 
problems of apparent equality of two stimuli so that the error due 
to constant tendency to wrong estimation and the error due to 
the order of presentation of two stimuli may be calculated. Thres- 
hold problems are served by the psychophysical methods known 
as the Constant Method and the Method of Limits. Problems 
of apparent equality are investigated by the Constant Method 
and the Method of Mean Error. The Method of Limits is the 
simplest. In the absolute threshold the intention is to find a 
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value of the physical stimulus which is just perceived by the subject 
(=‘observer’ of some American texts). This is simply done by 
starting with a low value and gradually increasing the value of 
the stimulus until the first apprehension of sensation takes place. 
In the descending series a high, well perceived value of the stimu- 
lus is taken as the starting point and the stimulus decreased until 
no sensation results. The average of the two stimulus values 
where sensation first takes place (on the ascending series) and 
where it is no longer felt (on the descending series) gives the abso- 
lute threshold for that modality of sensation. In case of differen- 
tial threshold one stimulus value is held constant and used as 
the standard. Another source of identical stimulus is used to 
vary the values which are to be compared with the standard. 
The variable values are changed upwards and downwards well 
past the standard, presented simultaneously or one after the other, 
and repeatedly. One point above the standard and another 
below the standard are found, by averaging as before, where the 
variable stimulus is just perceived as greater than and less than the 
standard. These points are the differential thresholds for the 
standard value i.e. it is at these distances on the stimulus scale 
that the subjects perceive a difference in the quantity. of 
the sensation. 

The Method of Mean Error demonstrates the error due to the 
order of presentation of two stimuli and is of methodological signi- 
ficance. If by comparing the variable stimulus with the standard 
in one time-order (i.e. the standard is presented before or after 
the variable) or space-order (i.e. the standard is to one side of the 
variable or the opposite) the estimated equality value is L, and 
by reversing the order of presentation it comes out as L,, then the 
true threshold will be equal to X by the following equation 


ya iith 
2 


X is thus the average of the two estimations with the opposite 
orders of presentation. 

But L, and L, may include, apart from a constant error of under 
or over estimation, an error due to the order of presentation thus 


fy = ag 
Ly = X—q 
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where q the error due to space or time order. Solving this equa- 
tion for g, we get 


A concrete example will make this clear. Let 20 cms. be the 
standard and let mean estimations of the length with standard 
in order 1 (say standard to the right) and in order 2 (standard 
to the left) be 21.2. Then the true threshold is 


21.2+21.2 
2 
The constant error here is of 21.2—20—1.2 cms., the standard 


of 20 cms. being over estimated under both orders by the subject 
to this extent. And 


A= = 21.2. 


21.2—21.2 
q oe 9 Sak 


If L = 21.2 cms. in order 1 and L, = 18.8 cms. in order 2 then 


91.24+-18.8 
f= —— —90 


and C or constant error is equal to O, C being defined as 


C= X—S 


where S: is the standard. 

This would show that though there is a systematic tendency to 
over-estimate on one side and under-estimate on the other, should 
the subject use both orders of presentation he can cancel out his 
error, which is entirely due to spatial or temporal positions of the 
compared objects. The error due to ordering now comes out 
as substantial. 


212-188 24 
ats ee ae 


Space or time error therefore amounts to 1.2 cms., its sign depend- 
ing on which order is regarded as coming first. A person’s 
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liability to this kind of error means a systematic distortion of the 
impression so that the quantitative aspect of perceptions is en- 
larged on one side and reduced on the other. This simple ration- 
ale is a classic example of how the analysis of error components 
is made possible in experimental situations of a psychological 
nature. ‘The principle is utilized in other forms of measurement. 

The Constant Method sets the pattern of an important type 
of solution in psychological measurement. It requires only a 
few values of the variable stimulus around the standard value. 
The two values are to be compared as before. The standard 
in the present example is set at 20 cms. of a line. Five values of | 
the variable are selected for presentation: 18, 19, 20, 21, 22 cms. 
Each value of the variable is presented in random order with the 
standard 20 times, the total number of presentations being 
9x 20=100. Any given time or space order setting is used. The 
results of 100 presentations are recorded as judgements of V>S 
(variable greater than standard), V<S (variable smaller than 
standard), equality and sometimes of doubt. The percentage 
results are tabulated thus: 


Standard=20 cms., setting: St. to the Right 
per cent judgements 


Value of | 

Variable V>S V<S y=) Total 
a a2 4 4 100 
Zi 80 12 8 100 
20 28 20 52 100 
19 : 20 64 16 100 
18 4 86 10 100 


The following formula gives the lower and upper thresholds : 


_ Df (50—a) + Da (6—50) 


D 
u fp—oa 


where D,, is the upper differential threshold for standard 20 cms. 
using V>S column. 
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Df = Value of variable giving nearest percentage above 50 for 
column V>5S. 
Da = Value of variable giving nearest percentage below 50 for 


column V>5S. ; 
a& = Percentage just below 50 for column V>S 
ee 29 >» above 29 9 9 
D,= (lower differential threshold) is calculated by the same for- 
mula using column V<S. 
The solutions for D, and D, are: 


21(50—28) +20(80—50) 
Di ee 


19(50—20) +20(64—50) 
1 = 2 64.00... = 19.32 Cms. 


A graphical solution is also possible where Y ordinate represents 
the percentages and X the values of the variable stimulus. The 
value of X against 50 per cent point on Y ordinate gives D,, and D, 
readings on the curves for V>S and V<S columns for the two 
thresholds. 

The constant process has a methodological significance and 
the essential elements of it should be grasped by the research work- 
er in education and psychology. Texts of experimental psycho- 
logy which give a fairly detailed account of it (Thurstone 1948, 
Collins and Drever 1934) might be consulted. 


Modern Psychophystes 


The classical psychophysics gives us a physical continuum which 
is measurable and a parallel psychological continuum of absolute 
threshold and differential thresholds. Unfortunately only that 
kind of stimulus on the physical continuum is measurable which 
arouses simple, elementary sensations and this severely restricts 
the field of its application. Physical stimuli in ordinary life are 
rich and manifold in sensory modalities and cannot be stated in 
simple quantitative terms. And yet the psychological continuum 
remains unitary such as degree of liking for some works of art or 
articles of food or verbal statements on an issue of interest to us. 
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Sensations in their pure, quantitative aspects are artificial abstrac- 
tions in the laboratory and cannot be credited with the truly psy- 
chological properties and inwardness of appeal of the more com- 
plex physical stimuli with which human consciousness is bombar- 
ded right through the day. The qualitative aspect of sensations 
is not measurable on a single physical scale although psychologi- 
cally relative preference of qualitatively different stimuli is con- 
ceivable as registering on a single continuum of degrees of pre- 
ference. For example, various colours or smells though not pre- 
sentable as gradations on a single physical continuum do form 
an ordered series on a subjective scale of preference. If we are 
interested in the psychological continuum fer se an attempt should 
be made to devise a scale for it, independently of the unidimensio- 
nality and measurability of the related physical stimuli. The 
late Prof. L. L. Thurstone in one of his most original contributions 
proposed a method for this. It is based on his law of comparative 
judgement and is claimed as an advance on the classical psycho- 
physics. Its originality lies in applying psychophysics as a 
measurement technique to complex stimuli which form part of 
our real life experience and in securing a metric which can be 
applied directly to such stimuli without the mediation of simple 
physical properties which provide the thresholds in the sphere 
of pure elementary sensations. 

The theory which icads to the pair comparison method is well 
described by Thurstone and Guilford, and a student desirous of 
a detailed knowledge is referred the excellent exposition by the 
latter (Guilford 1954). In Thurstone’s rationale the concept 
of discriminal dispersion states the fact that when a stimulus is 
presented it elicits a response which varies ever so slightly from 
occasion to occasion. Symbolically 


Rog = By + C05 


where R,; = the response of the organism on the given occasion, 
k; = the ‘true’ response of the organism to the stimulus and Cos 
the error component of the response on the given occasion. 
When two stimuli are compared each has a ‘true’ correspon- 
ding response with a discriminal dispersion around it of occasion 
responses which include an error component. When two stimuli 
are, in the matter of affective response, far apart they are easily 
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discriminated. As they draw close together their discriminal 
dispersions, which are assumed to be normal, overlap. The ‘true’ 
difference in two responses to two stimuli has around it a scatter 
of differences perceived on each occasion which also is assumed 
to have a normal dispersion. Further assuming the standard 
deviation of each response distribution to be unity and the correla- 
tion between the errors of responses to be zero we get 


This equation is reduced from 
Cay = s/o° 4-07 —2r, 00. 
diff a y wy ay 


by means of the above assumptions, x and y being the response 
points for the occasion and o%, a; being the variances of the res- 
ponses to the two stimuli. Since we do not yet have any method 
of finding the scale positions of the responses on the response conti- 
nuum, this is largely a theoretic construct of no immediate practi- 
cal use, but it furnishes us with the basic concepts on which the 
method is developed. 


Scaling 


Operationally in Thurstone’s method we need a number of sub- 
jects or observers who are presented with all the stimuli in all 
possible pairs. If things are taken two by two the number of 
possible pairs is found by the equation, n(nm—1)/2. Thus 5 stimuli 
will produce: 5(5—1)/2=10 pairs. 

The response is of two simple types, one stimulus being pre- 
ferred to the other or vice versa. When a large number of persons 
So respond we can say how many persons preferred one stimulus 
to another, a fact easily expressible as a percentage or proportion 
out of |. We could say that of a sample of 60 persons 80 per cent 
preferred painting x to painting y. The proportion is derived by 
dividing 80 by 100, the value being .80. It may be noted that 
in the classical psychophysics on account of the pure quantitative 
nature of the change in stimuli and their perceptual simplicity 
the same person could be repeatedly presented with the same pair 
(in random order with others as in the constant method) for 
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discrimination. For a single person, paintings x and y, having been 
once seen and judged as to relative merit lose all meaning for 
subsequent presentations. Hence the need for multiplying the 
subjects or observers to obtain the proportion of preferences for 
each pair. Incidentally it may be noted that this makes all scaling 
of stimuli on the psychological response continuum a function 
of the group involved. For example, the artistic scale values of 
musical pieces will vary from one group of audience to another. 
The classical psychophysics concerned itself with the discrimina- 
tory power of the individual; ‘modern psychophysics’ is primarily 
interested in ‘scaling’ the stimuli on a psychological continuum 
by regarding the scale positions of compared stimuli as a function 
of group preferences. 

The rest of the technique is a fairly simple one. The propor- 
tions lead to a matrix of nxn size, n being the number of stimuli 
compared. The stimulus heading the column is preferred to 
itself, by assumption, to the extent of .50. The other entries 
show in what proportion it was preferred to every other stimulus. 
This leads to a matrix with .50 entries in the diagonals and comple- 
mentary entries in the symmetrical cells. For example, the 
stimuli are five colours and the following matrix of proportions 
is obtained: 


PROPORTIONS MATRIX FOR 5 COLOURS 
. (Colours heading columns preferred to the rest) 


Purple Orange Green Blue Red 


Purple 50 .60 70 80 90 
Orange 40 .50 65 75 85 
Green 30 oo 50 70 80 
Blue 20 Eda 30 50 if) 
Red .10 15 .20 Bs 00 


The proportion can be shown as a line dividing the frequency 
ofa normal curve (=1 for proportions) into two halves, one 
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showing the area of preference of x over y and the other showing 


the preference of y over x. 


PROPORTIONS OF 


Thus the figure below 


75 AND 25 AT POINT P. 


shows preferences in the proportion of .75 and .25 respectively 
for x and y. To make this division in the curve area of I, a per- 
pendicular is to be drawn at point P. The normal probability 
table enables us to read the distance of point P from the centre 
or mean O. This distance is + .67¢ approximately from the 


mean in terms of unit standard deviation of a normal curve. 


At 


this point we divide the curve area of 1 into .75 on the side of x 


and .25 on the side of » preferences. 


The next step. in the proce- 


dure is to convert the proportion matrix into a matrix of corres- 
ponding P points on the base line, as shown below for the example 


in hand: 

Purple Orange Green Blue Red 
Purple : ad) 02 84 1.28 
Orange — .25 39 .67 1.04 
Green — 02 <== 39 : on 84 
Blue — 84 — 67 — .52 67 

Red —1.28 —1.04 — .84 — .67 
—2.89 —1.85 — .45 136 3.83 
sel 49 85 1:35 
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It will be noticed that the across-the-diagonal-complemen- 
tary matrix (yielding unity as the sum of each symmetrically 
opposite pair of entries) has in the conversion become a symme- 
trical matrix. This is due to the fact that in the Normal Proba- 
bility Table the opposite entry is merely the reverse of the relation 
expressed by the first, the P point shifting the exact distance to the 
other side of mean O. Thus Orange is preferred to Purple by 
.60 fraction of the total number of subjects. Conversely Purple 
is preferred to Orange by .40 fraction of the sample. To express 
this latter value the point P shifts, an equal distance away from 
the mean to the minus side. Hence the repetition of the value 
with merely a change of sign. 

The stimuli are so arranged that the lowest totals of columns 
occur first and increase steadily. ‘The column totals are the alge- 
braic sums of the cell entries, the next line gives the averages of 
each column total, divided by 5, the number of entries in each 
column. The last line gives the same values with the constant 
-+-.58 added to each so as to eliminate the minus values and give 
the lowest a commencing zero status. This zero value is arbitrary 
and derived from the column total of the least preferred item, 
which may contain some chance error. With further experimen- 
tal procedures it is possible to derive a psychologically meaningful 
zero point of the scale independently of the least preferred item 
which represents, on the basis of sampling of both stimuli and 
the subjects, a point of indifference. ‘This extension of procedure 
is described by Guilford (1954) and is recommended to those 
sufficiently advanced in the use of statistics. It involves the con- 
version of obtained values to absolute scale values by means of 
linear transformation of the algebraic form y=mx-bc. 

Both classical and Thurstonian psychophysics are methods of 
individual differences. The classical methods investigate experi- 
mentally the discriminative ability of the individual with regard 
to stimuli which are graded on physical scales; the Thurstonian 
method uses a statistical approach to indicate the scale positions of 
more complex stimuli on the continuum of relative preference. The 
similarity with widely different results of the comparisons in the con- 
stant method and the method of pair comparisons may be noted. 

Scaling is a significant point of departure in mental! measure- 
ment. Once the concept of a subjective continuum had taken 
shape and found visible expression, the problem of devising 
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intensive scales for other subjective impressions was squarely faced 
and a number of solutions proposed. Among these the techniques 
of attitude measurement are of considerable methodological in- 
terest and worth the careful study of any researcher in education 
and psychology. We shall turn now to Thurstone’s unique contri- 
bution in this field. 


VII THE SCALING OF ATTITUDE 
The Thurstone Method 


Attitude is a bipolar dispositional complex which, for the pur- 
poses of measurement, may be regarded as located on an intensity 
gradient ranging from extreme disapprobation to hearty approval 
in respect of a particular object, social institution or practice or 
a corresponding proposition e.g. regarding tobacco or alcohol, 
teacher training, the UNO and the dowry system. If such a 
disposition is accepted as a unidimensional, linear continuum 
then Thurstone’s proposal to measure it by means of statements 
scaled by the method of equal-appearing intervals (Guilford 1954) 
would apply. The procedure is fairly simple (Thurstone and 
Chave 1929). A large number of statements of various shades of 
favourable and unfavourable opinions on the issue under investi- 
gation are written down on slips of paper, which a large number 
of judges exercising complete detachment sort out into 11 piles 
ranging from the most hostile statements to the most favourable 
ones. The opinions are carefully worded so as to be clear and 
unequivocal. A count of the proportions of judges placing a 
statement-slip in the successive piles is taken and these are cumula- 
ted forward from the lowest (most unfavourable) to the highest 
(most favourable) category. The following two statements on 
teacher training illustrate the procedure: 


Statement 1. Teacher training serves no useful purpose. 
2. Teacher training improves the quality of our teachers. 


TABLE OF CUMULATIVE PROPORTIONS 


Statements A B C Df oe G F<: Soares J K 
0-1 1-2 2-3 34 4-5 56 67 7-8 8-9 9-10 10-II 

1. 00° 205. .20. .60 .90 1.00 £00 £00 1.00 1.00 1.a8 

Zen h OS: 10 415. ..20: .80 <5 56) Oe 90. 1 ae 
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These two statements yield the following two curves : 


"75 Peece one --- ~~ +--+ fe 


VALUE =4°69 


4:10 469 5°40 
CURVE OF STATEMENT | 


VALUE = 6°27 
Q= 3-30 


4°40 627 7-70 
CURVE OF STATEMENT 2 
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Scaie value on the scale of 0 to 11 is given by the reading on the X 
ordinate against .50 cumulative proportion. For statements 1 
and 2 the values are 4.69 and 6.27. The reliability of this scale 
value is measured by the quartile deviation and is given by the 
following equation, 


Q=p 75 —P 25 


where Q is twice the quartile deviation and p,,, p,; the scale values 
against these proportions on the Y ordinate of proportions. The 
Q values for the two statements are : 


Q,=5.40—4.10=1.3 
‘= 7.10—4.40=3.3 


It will be immediately seen that the steeper the curve 
is the less is its quartile deviation or dispersion on the 
judgements scale and the more reliable it is. When a large 
number (for 50 to 100) of statements have been picked for 
good reliability and even distribution over the: scale from 0 to I], 
the scale is ready for use. The respondent is to give his reaction 
to each statement by endorsing or rejecting it. He wins a limen 
type of score as an average of the sum or median of the values 
of the statements he endorses. The rejected ones are simply 
ignored. This average or median value of statements endorsed 
by him is his place on the scale of attitude which extends from 
Oto 11, O expressing an extremely unfavourable attitude and 
11 the other end of the attitude continuum. 


The Likert Scale and Remmer’s Method 


Of course the pair comparison method could be applied if state- 
ments were few and carefully chosen so that they could be judged 
in pairs; and so can bea few other methods discussed at some 
length by Guilford (1954). But the method frequently mentioned 
with Thurstone’s is due to Likert (1932). The Likert scale uses items 
worded for or against the proposition, with five-point rating 
response indicating the strength of the respondent’s approval 
or disapproval of the statement. The checks or ticks on the five 
point rating response are weighted simply | to 5 (or 2 to 4 in case 
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of three category response). .Items are summated over the total 
number of items and a summative score obtained. This proce- 
dure makes the Likert method very much like an ordinary test, 
Eysenck and Crown (1949) have proposed a combination of the 
two methods, Thurstone’s and Likert’s, by giving the statement, 
the Thurstone scale value and the response, the likert weight. 
Both Thurstone and Likert scales are specific as to the issue or 
object the attitude towards which is measured. Remmers (1934) 
Proposed a more generalized type of masterscale which could. 
be used on any issue. The Statements of his scale are of a 
general condemnatory or approbatory nature e.g. = 

ee ree is an unmitigated evil. 

(2) A good deal of Progress would depend ultimately on... . 


Gutiman’s Scale Analysis 


.) 
a 


Guttman (1941, 1950), an extremely original worker offered a 
* criticism of the above types of scale from which he educed certairf 
original principles of scaling which have not been as well 
received as they deserve. The scales so far considered, permit 
a person to endorsé a severe opinion and refrain from endorsing 
a mild one. Is it logical for a person to agree with ‘Prohibition 
is the greatest single promoter of happiness in a civilized society’ 
and withhold agreement with ‘Prohibition will prevent the break- 
up of many a family’? If the first statement occupies a higher 
scale status than the second, should it not contain, in the manner 
of a box within a box, assent to the first as a precondition of its 
endorsement? Is it a logical scale of distance that a. yard does 
not include a foot? Guttman: brought up this pertinent criticism 
‘in 1942 and subsequently proposed a form of scaling known as 
scale analysis which really is a procedure for devising a scale cal- 
culated to be unidimensional and such as would lead us back 
from the total value to the pattern of responses. ‘T’he index ‘of 
‘reproducibility shows to what extent the principle of a stronger 
opinion including and lying beyond the weaker is honoured by 
any ordered string of statements. If reproducibility is low 
and there is reversal of endorsements on statements of low 
and high value, many of the conditions of scalability of the area 
are not met. The technique of scalogram analysis sorts out state- 
ments which yield a reproducible scale and has been described 


6 
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competently by him (Stouffer 1950), There has been much prag- 
matically oriented criticism of Guttman but the writer feels that 
the logical foundations of the Guttman technique are correct 
and unbiased scientific effort should be directed to use the Gutt- 
man rationale for future developments of a constructive nature 
in the field of attitude measurement. What the Guttman techni- 
que is intended to achieve is suggested by the following scheme 
of responses to 3 items having a dichotomous response: 


Individuals -+- (Agree) and — (Disagree) responses 
ee ee 
om, (+) Ce Oe, Ge (=) 
3 2 1 3 2 1 
I R R R 
2 R R R 
i R R R 
4 R R R 


where R indicates the} endorsements. 
The reproducibility criterion is obvious as is shown below: 


Reproducible 
Individual Scale status item response 


Hh © NO = 


The basic measure of error is the percentage of errors of reproduc- 
tion of scale pattern. The intensity with which opinions are held 
is a correlate of the attitude continuum and is related to it by a 
U-shaped curve because extreme ‘views express strong feelings. 
The zero point of indifference is the point where the curve is at 
its lowest and closest to the base line of the attitude scale. If this 
is a sharp dip we have a location for the zero point of the scale 
which means indifference, but if the curve shows the shape of a 
blunt inverted wedge then we have an interval instead of a zero 
point. Some methods of treating intensity separately have been 
proposed (Katz 1944, Cantril 1946 and Suchman 1950). 


STATISTICAL ANALYSIS AND SCALING 83 


Coombs’ Unfolding technique 


It may be recalled that under the Thurstone method judges 
are required to allocate statements into 1] piles from the point 
of view of their strength. The personal attitude on the question 
is not allowed to affect the judgement. That such detachment 
is possible is a demonstrable fact. It has however been contended 
that one’s own status on the attitude scale is likely to colour one’s 
judgements. It would be more natural to judge every statement 
for its severity or favourableness from one’s habitual status on the 
attitude continuum. Coombs (1952) has developed the rationale 
of an ‘unfolding technique’ which bestows on the respondent the 
central status in the process of judging. Every person is required 
under his method to rank the statements in the order of their dis- 
tance from his point of view. A median status person will give 
a low rank to extreme statements irrespective of directions. Thus 
he becomes the hinge and the scale extending on both sides of him 
a folding tape which doubles over as suggested in the diagram 
below: 


6 ew, +3 


P, 0 —I = 5 


; | 
where P, is the position of person 7 on the attitude scale extending 
from + 3, most favourable to —3, most unfavourable, and 0 the 
point of neutrality or indifference, It is conceivable, that for P, 
a mildly favourable attitude is most acceptable and he regards 
both 0 and + 2 positions as equally unacceptable and equidistant 
from himself. In Coombs’ model both the statement and the 
individual have a fixed locus on the scale and the latter must judge 
the former as departing more or less from his own locus and thus 
providing a direction-free rank order. Coombs’ method utilizes 
these rank orders to arrange the statements on the attitude conti- 
nuum. Both in Guttman’s scalogram and Coombs’ unfolding tech- 
niques we get an imperfect pattern which must be shorn of irregu- 
larities to attain an acceptable contour. The techniques however 
do not provide any method of identifying certain features of the 
data as irregular. It is as if in a jumble of lines we could discern 
the possibility of two figures and could not say which one to accept 
and which to drop out as meaningless scratches. The research 
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worker must remember the unfolding model as an alternative 
when he considers the question of scaling.. 


Lazarsfeld’s Latent Structure 


A yet more fundamental and stimulating reorientation to the 
theory of scaling is due to Lazarsfeld (Stouffer 1950, Green 1951; 
Lazarsfeld 1954), His latent structure analysis is a general mathe- 
matical formulation which incorporates within itself Guttman’s 
idea of unidimensionality and relates it to the theory of factor 
analytic extraction of underlying uniformity which constitutes 
a genuinely scalable variable. In factorising tests we derive 
common factors which explain the intercorrelations of tests. 
Lazarsfeld is also interested in intercorrelations of items of attitude 
scales or other similar instruments of mental measurement. The 
Spearman view of factors holds that tests correlate because of an 
underlying common element called g. In this conceptual frame 
the actual test is a mass of heterogeneous elements and the g the 
pure, unidimensionable and scalable variable. Lazarsfeld applies 
these factorial criteria to the attitude area. Actually even Thur- 
stone’s multiple factor analysis can be regarded as an extension 
of the more inclusive principle of latent structure (Green 1952). 
In the sphere of attitude measurement latent class analysis is a 
specific form taken by thesame principle. If we merge, say, two 
distinct classes of persons into onesample and inter-correlate items 
on its basis the correlations obtained may be due to the layering of 
the classes one upon the other. Consider the following diagram: 


os . =. 


THE EFFECT OF COMBINING TWO CLASSES 
ON ITEM CORRELATION 


Items within each class are independent; the merging of two classes 
produces the correlations. Solutions for n,, proportion of persons 
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in class s and v;,, the probability of positive response to item 
for class s have been offered. Good descriptions of latent class 
analysis method of attitude scaling have been given by Lazarsfeld 
himself (Lazarsfeld 1959) which should be read by the student 
intending to use the method. It is at once apparent that if indivi- 
duals are to be assigned to underlying classes of an attitude then 
we do not have a continuum but a nominal type of discrete 
variable. Item responses indicate the probability of a person 
belonging to a given class on an attitude variable. 

_ The research student will have now appreciated how varied 
are the approaches to the single problem of scaling of stimuli 
(whether sensory, perceptual or verbal) with which Thurstone 
began his proposals to extend the coverage of. the classical psycho- 
physics. All the methods briefly summarized here are of the 
utmost importance in the scientific type of research in education 
and psychology. When a suitable problem presents itself the 
student should try to solve it by an appropriate method, the details 
of which must be followed step by step in the books recommended. 
Torgerson’s Theory and Methods of Scaling (1958) will be of immense 
use to the more advanced student in this field. 
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CHAPTER IV 


STATISTICAL METHODS OF 
RESEARCH IN EDUCATION Gagne 
PSYG@GHOLOGY: 


TESTING AND FACTOR ANALYSIS 
VIII MENTAL TESTING AS RESEARCH 
Research in Psychometrics 


Mental testing is an objective procedure for measuring with 
varying degrees of fidelity and accuracy variables like native capa- 
city, acquired knowledge and skill or other educable and psycho- 
logical characteristics of human beings. It is an extremely ad« 
vanced and complex branch of study now and is based on a well- 
developed framework of theory, ably and at length expounded in 
literature of great technical interest to the advanced worker. The 
theory and procedures are being continuously subjected to search- 
ing criticism and revised. A good deal of research can take place 
in this highly technical field of test theory and procedures. Deve- 
loping the theory on which testing is based is one form of this research. 
Improvement and criticism of existing standard tests and construc- 
tion of new ones for the same or a new, till then uncovered, area 
also amounts to research activity, the size and adequacy of the 
work deciding its status. Psychological entities are by their very 
nature so inward, subtle and subjective that they cannot easily 
be measured in quantitative terms. The absence of an objective 
unit in which to measure any given trait like, say, mental ability 
is a sore handicap. Besides this the approach to mental functions 
is through their operation on external objects and symbols. All 
these deterrents have made the very commencement of mental 
testing a task of considerable genius and originality. During 
the first half of the present century more and more aspects of the 
human mind were brought within the range of testing. Devising 
a method for the measurement of a new and intractable area of 
mind is still a very taxing and difficult job requiring a high degree 
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of acumen and ingenuity in the worker. The same is true of a 
new method for an area already somewhat inadequately covered. 
Thus in the field of personality assessment many proposals have 
been made and yet such is the complexity of the task and inade- 
quacy of the attempts that every now and then a new method 
based on somewhat different assumptions is put forward which 
by reason of its importance claims the status of research. 

In India we are a little backward in the matter of mental tests. 
We have no tradition in the indigenous education of such quick 
objective tests. Our method had been that of personal examina- 
tion like the viva voce and for the purposes in view it was quite 
satisfactory. Presently, however, we want measures of many 
aspects of the human personality and we, therefore, need the new 
kind of objective tests in large numbers. Much of our activity 
is concerned with the work of producing Indian adaptations of 
foreign tests. We have not yet come to grips with the problem 
of producing original tests of our own but if the present progress 
in test production is maintained the more original and brilliant 
of the workers will soon be making their own contribution to the 
world’s already formidable array of tests. A good adaptation 
with critical amendations is also acceptable to research activity 
in education. 


Areas of Testing 


Testing applies to many areas in psychology but its most specta- 
cular achievements are to be seen in the cognitive group of abili- 
ties. Human ability is either specific or general and assumes in 
actual operation many forms. Tests have been put up for the 
measurement of practically every one of them. The traits covered 
are largely innate and the objective is to reduce the effect of en- 
vironmental differences in conditions of growth and development 
to a minimum. Juxtaposed to this is the area of learning and 
acquired ability which is covered by objective tests known as tests 
of achievement or attainment. Aptitude tests are ability tests 
which indicate promise in special types of learning or work. 
Under scaling we have already considered the problem of quantifi- 
cation of attitudes. Interest is an allied phenomenon and 
methods of mapping it have also been proposed. Personality 
and its malfunctioning are approached from many angles and 
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theoretic constructs, and a host of tests have been developed for 
this area including outstanding and significant contributions like 
the Rorschach and the Thematic Apperception tests. Traits of 
personality like temperament, emotionality, character, persevera- 
tion, sensory acuity and tendency to minor or major mental dis- 

turbance, as also affiliation to multitudinous typological categories 
have been subjected to testing with interesting results. | 

Measurement of mental characters can be achieved with varying 

degrees of mathematical rigour through ranking, rating, pair 

comparison and interval methods but the common form of the 

objective test yields point scores out of a possible maximum. The 

theory of test scores is a matter for the advanced student and the 

expert but their simple use with the usual caution as to possibility 
of error is only too common, although it does rather over-simplify 
the picture. For in the use of the simple additive score a number 
of assumptions are involved which could be questioned in a severe. 
rationale. During the course of this book it will not be possible 

to deal with such controversial issues of a rather advanced nature 

but the interested are referred to F. Lord’s Theory of Test Scores 
(Doctoral Thesis, Princeton University) and H. Gulliksen’s Theory 
of Mental Tests. 


Types of tests 

Mental tests can be classified variously according to the principle. 
of differentiation employed. ‘They can be considered as involving 
apparatus of some kind in the administration or as based simply 
on the use of paper and pencil. There are tests which imply 
familiarity with the written form of a language, known as verbal 
tests and, as against these, there are those that can be solved on 
paper with a pencil but do not require knowledge of a language 
in its written form. The following scheme summarily exhibits 
the types of such contrasting categories into which most mental 
tests tend to fall: 


Pen ener nant 


Classes 


Principle of Classification 


Apparatus or Paper and Material required for the test. 
Pencil 
Individual or Group Number of persons that can be given the test 


at a time. 
Non-verbal. or verbal Subiect’s literacy or illiteracy. 
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Ability or Aptitude The type of innate function in cognitive area 


tested, simple or complex, present or potential. 
Ability/Aptitude or Innate or acquired power. 
Attainment 
Skill or knowledge Class of attainment. 
General or specific Range of the application of power. 
Single/Omnibus or Battery Number of tests applied and their complexity. 
Score or Mental age The metric employed for evaluation of per- 
formance. 


The test which yields an additive score is radically different 
from the test which yields a mental age. Any ordinary paper and 
pencil test will enable us to count the number of items correctly 
answered, which becomes a simple score. The process of arriving 
at this is a simple tally on the all-or-none principle of unit award 
for correct response to each item. This score can also be con- 
verted into I.Q. but is in nature different from the mental age 
as calculated by the methods employed in performance tests like 
Simon-Binet and its adaptations in the U.S.A. and U.K. The 
Simon-Binet type of performance test has for each year of age a 
number of test tasks carrying mental age credits of specified months. 
The subject is given the lowest group of test tasks in an individual 
administration and goes on to successive years’ groups until he 
comes to the last group which he completes correctly in entirety. 
He is allowed to go on with the succeeding items until he fails 
on a year group of items completely. Each test task in the age 
groups carries a mental age credit of some months. The ‘basal’ 
mental age is indicated by the group of items which the subject 
passed in entirety. For all subsequent fractional successes each 
item completed is given the specified mental age credit. These 
are totalled and converted into years and added to the ‘basal? 
mental age (i.e. the mental age of the set of test tasks which the 
subject passed in entirety). This becomes the mental age of the 
person tested. Intelligence Quotient is given by the equation: 


Mental age 


Pa ‘Chronological Age gs 


LQ. 
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It is obvious that those who have the mental age associated 
with their chronological age group are normal and get an I.Q. 
of 100. As the mental age increases or decreases relatively to the 
chronological ‘age the I.Q. is more or less than 100. The perform- 
ance test of the Binet-Simon type on the construction side is 
concerned largely with finding suitable test tasks for each age 
group which is imaginary in case of adult forms and real in case 
of growing children. The procedures relating to this are indicated 
briefly in the related manuals, texts and other technical literature. 

The group variety of paper-and-pencil test is the more popular 
as it takes less time with a given group of examinees. It requires 
less training for administration and is interpreted readily in terms - 
of percentile or other kind of age or grade norms. We shall outline 
the standard procedure for this kind of test and discuss some of 
its rationale. 


Test Construction : 
SPECIFICATIONS 


All testing must start by a consideration of the limitations 
under which the test must be produced. Testing is essentially 
and basically a practical business and the objectives are invariably 
specific and concrete. Every other decision is determined by 
the nature of the objectives and the resources at one’s disposal, 
for the test-maker has to work under given and not ideal conditions. 
So the work must begin by a detailed set of specifications as to 
the purpose of the test and the time, money and personnel at the 
disposal of the test-maker. The population for which the test 
is to be made has to be defined. Then the purpose of the test 
is stated and in the light of this the type of test item to be used 
to measure the given trait or function is to be selected or devised. 
Here not merely the content of the item but also its form is to 
be considered. Sometimes to cover all aspects of the function 
a variety of items are to be included. There is as infinite a popula- 
tion of possible items as there is of possible examinees and the 
problem of sampling will arise in both spheres. These preliminary 
set of considerations and decisions are known as ‘test specifications’ 
and to the extent these are clear and unambiguous, further progress 
will be smooth and orderly without any need for revisions and 


STATISTICAL TESTING AND FACTOR ANALYSIS 91 


second thoughts. It may be decided, for example, that an attain- 
ment test of mathematics or of general mental ability is to be 
produced for the class IX or the 12 year-olds of a defined locality. 
After the population of examinees has been designated the popula- 
tion of items from which a limited number are drawn is to be 
defined. This is a task requiring considerable ingenuity in the 
test-maker and familiarity with existing tests. The length of the 
test is also to be decided because on it depends the cost, in terms 
of time and money, of giving and scoring the test and interpreting 
the results. 


PRELIMINARY DRAFT: 


In the second phase the test-maker gets busy in a more detailed 
way with the preliminary draft. He considers the types of the 
items available to him and is in fact free to create original items 
of his own to cover the function or trait adequately. Here familia- 
rity with the existing tests in the area is a prime necessity. One 
has to consider now the pattern of the response expected. For 
economy the response sheet is separated from the test booklet, 
provided this does not materially increase the difficulty of making 
the response. Whether the response is given by indicating one 
of many alternatives as correct or by inserting the answer, hag to 
be decided, the former obviously giving scope to the examinees 
to guess or take chances with the given set of alternatives and 
making a score on half-knowledge, ‘hunch’ or by sheer luck. Once 
the types of items have been fixed the test-maker must compile 
a large number of items of suitable difficulty. A rough idea of 
this can be obtained by trying a few items on a small group of 
examinees from the population. More than double the items 
necessary for the test must be collected. ‘These are then prefaced 
by carefully worded instructions which indicate briefly the nature 
and purpose of the test, the nature of the task with a few examples 
and exercises which are completed before the proper test is begun. 
This first draft of the test is then submitted to superiors or co- 
workers for frank opinion and criticism. Many false assumptions, 
slips and oversights are caught out in this process, which, 
though absurd in the suggestion, will be of some practical con- 
sequence and should not be omitted. It is better to reproduce 
a few cheap copies now and administer the preliminary draft 
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to a small, say 30—50, examinees and check the answers. A 
few further modifications may be suggested .by this procedure. 
After these have been incorporated the ‘preliminary draft’ is ready 
for the press or for cyclostyling on the Roneo machine. 


coe Be TORY ws 


The next stage is known as the try-out. At this stage the 
preliminary draft is administered to a large random sample of 
the population for which the test is being made. The size of 
the sample is fixed with an eye to the quick and easy derivation 
of subsequent indices of difficulty and discrimination which are 
used for picking out good items for the final test. There are 
various methods of deriving these indices for items and some 
tables and monographs have been prepared to facilitate the finding 
of these values from the performance of the sample on each item. 
The size of the sample for try-out will depend on what method 
or ready reckoners one intends to use for item-analysis. For 
example, there are tables connected with the performance on an 
item of the top 27 per cent and the bottom 27 per cent cases of the 
sample. Since later the percentage of these top and bottom 27 percent 
cases passing and failing an item are used, it will be economical 
to have a size of sample, 27 per cent of which will be about 100, 
so that the percentage of failures and passes of these top and bottom 
100 are automatically calculated. Thus if a sample of 370 is 
used the top 27 per cent and the bottom 27 per cent of it will 
give us 100 best and 100 worst persons. The per cent passing 
the item from these top and bottom groups can now be 
referred to the table based on them and the indices read out. 

The try-out is timed so that in an increasingly difficult test 
nearly 90 per cent complete the last item. In case of speeded tests 
where time is the material factor in producing the scatter of scores 
and the difficulty is nominal, a duration that produces good 
effort without fatigue, and good scatter should be used. There is a 
compromise to be made between allowing the early finishers to 
leave the room and thus discourage the slow ones and start a 
stampede, and keeping them at the desks until they grow restive 
and begin to interfere with the slow ones. Some of this time lag 
between early and late finishers can be used up by asking the 
examinees to revise their answers though this is not invariably 
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advisable. The scripts are next collected and scored against a 
ready-made key. Hand scoring is slow and liable to chance error 
which should be checked by re-scoring a few sample batches of 
scripts (Kimball 1950).1 


ZTTEM-ANALYSIS: 


The next phase is concerned with item-analysis which is a 
technical name for deriving for each item or question of the test 
two indices—one of its difficulty and another of its power to dis- 
criminate between the good and bad performers on the test. 
Many kinds of such indices have been proposed. Good discussions 
of these indices are available in H.S. Conrad’s Characteristics and 
Uses of Item-Analysis Data (American Psychological Association), 
F. B. Davis’ Item-Analysis Data (Harvard University) and H. Gul- 
liksen’s Theory of Mental Tests. One begins by arranging the 
scripts from the highest to the lowest obtained score. The scoring 
is done on the principle of awarding one score for each correctly 
attempted item. If the items are of the multiple choice type of 
response and the answer consists in indicating one of them as 
being right, the question of winning scores by taking chances 
blindly with the provided alternatives arises. If the response 
choices are many the chance score may not materially alter the 
picture of each person, but when these are few, say two, then 
even without any definite knowledge or ability a good gambler 
will win by chance nearly half the total possible scores. If most 
examinees complete all the items there is no need to worry about 
Quessing as the advantage is equal for all. Only where the number 
of unattempted items is large, the question of correction for guessing 
will arise. Some formulae for this situation have been considered 
by Gulliksen in his text on test theory. The better remedy, 
however, will be to increase the time or shorten the test so that 
large numbers of unattempted items do not occur. From the 
arranged scripts or answer-sheets the top 27 per cent and the bottom 
27 per cent are separately taken. Next the percentages of the 
two groups passing a given item are found. Tables carrying 
marginally p; and py, i.e., per centage of lower and higher group 
passing the item, are then entered and yield an index of the relative 


1Kimball provides an interesting treatment of this problem and should be 
consulted by those who are conversant with Logarithms. 
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difficulty of the item and its correlation with the whole test. The 
Item Analysis Table prepared by C.T. Fan for the Educational 
Testing Service, Princeton, converts the average p = (P,;+P,y)/2 
into an index A (delta) by the equation 


Ri 13-44 


where x=the normal deviate taken as positive for p’s less than 
.00 and negative for f’s greater than .50, because a greater number 
passing makes the item easier, so that the second term must become 
subtractive. The constant of 13 is to keep all A’s positive.” The 
multiplier 4 of the normal deviate which ranges from +3 to —3, 
is to stretch the range of A so that fractions may be rounded off 
easily. Frances Swineford suggests that the difficulty values of 
items. should vary with the number of alternatives in multiple- 
choice response according to the following scheme: | 


No. of choices Appropriate A 
2 10.3 
3 11.3 
4 Lif 
5 12.0 
6 12:2 


There is considerable sampling error in such indices of difficulty 
and discrimination, specially in the latter. Items of 50 percentage. 
difficulty yielding A of 13 give the best spread but concentrate 
discrimination in the middle range. Items of that percentage of 
difficulty must be taken which belongs to the part of the range 
of scores where maximum separation is wanted. For example, 
if we wish to discriminate rather well among the bright ones then 
items of high difficulty will answer the purpose. The correlation 
of item and test given by r’s in the table should be above 0 but 
low values of it can be well tolerated on account of their large 
sampling error. On the basis of these two indices items to go 
into the ‘final draft’ of the test are picked out. - 

There are innumerable ways of selecting items from the Try-out. 
Vernon (1948) and Anstey (1948) consider at length many 
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proposals for analysing items for their difficulty and discrimination. 
In Edinburgh, which was the source of the well-known series of 
Northumberland Tests, one method divides the scripts into six 
sets from highest to lowest of 40 each and uses S, to S,, the number 
of correct answers in each set, for an item in the formula: 


This formula gives the value of 1 for E,,; when discrimination is 
good and —1 when it is thoroughly the reverse. Two proposals 
are of special interest. Bedell (Bedell 1950) demonstrates now 
an optimum number of items can be determined for a test so that 
the reliability is maintained without increasing the length of the 
test. H. Walker and S. Gohen (1949) have proposed a method 
of sequential sampling of persons from top and bottom to judge 
the discrimination of an item on the basis of probability. This 
means that we do not start with a set number of scripts to evaluate 
every item but may judge an item on the basis of a few contrasting 
cases only. ; 


PINAL DRAFT 


The selected items go into the final draft which is ready for 
administration. The somewhat shorter final form is timed again 
so that approximately 90 per cent of the population complete 
the test in the’given time without feeling unduly rushed. The 
test can then be administered to as large a sample as is practicable 
for developing the well known test parameters of reliability, 
validity and norms. 


Meaning ful scores 


A final word of caution about the gross score in tests carrying 
items with multiple choice responses. If an examinee is free to 
pick out one of several alternatives as his choice of the correct 
answer he who is willing to take risks will make a proportion of 
scores on chance hits alone. Even in the final draft this raises 
the question of the chance score. The score is meaningful in 
discriminating between the able and the weak only beyond the 
chance score.. The concept of ‘meaningful scores’ is useful and 
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gives the score range within which alone the score discriminates 
between bad and good performers. The formula (Gulliksen 
1950) for standard error of chance score is 


oe V K(A—1) 
A 


where 5S, = standard error of chance score 
kK = number of items in the test 7 
A = number of alternatives in multiple response items. 
The average chance score is given by 


Kk 
My = 7 


Meaningful scores are then defined as 
S,, = K—(M,+25,) 


where S,, = range of meaningful scores after the second term 
in brackets. Thus for a case where K = 50, A=5 


50 
M, === 10 : 


a 2.83 


and S,, = 50—(10+2x 2.83) 
34, 


The meaningful scores extend, therefore, from 16 to 50. On 
account of the multiple choice type of response we may assume 
that upto 16 scores have been obtained by examinees by taking 
chances with the alternatives over all items. 


! 


Reliability 


Cronbach (Cronbach 1947) has systematized the thinking on 
reliability. He suggests that it assumes and appears in three 
forms, viz. as the coefficient of consistency, equivalence and stabi- 
lity. When a test is self-consistent, i.e. each part of it ‘agrees’ 
with every other so that inter-item correlation is high, we get a 
measure of its consistency. Kuder-Richardson - (1939) supply 
two formulae for this: | 


ae 
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eos 
(1) 1%, =(— a (K—R formula 20) 


eS 
as, = (cl nD) (K—R formula 21) 
n—1 O; 


where 7, = the reliability of the test 
= the number of items 


n 
OF = the variance or standard deviation squared of the 
test 
fp = proportion of sample of examinees answering item 
correctly 
a> le 
p = average p for n items 
q = 1-5 
% = being the summation sign, summation being over 
n items, 


These formulae are based on the simple fact that the test variance 
is the sum of item variances and covariances. The variance of 
an item is given by the product, fq, i.e. the proportion passing it 
multiplied by the proportion failing, which is 1—p. Thus if on an 
item 27,60 per cent pass then 


o? = 60.40 


or .24, where oj means the variance of the item 7. To the extent 
items are consistent with each other their covariance or correlation 
squared will be high, and individual, independent variances low. 
In terms of standard scores or values every distribution has a vari- 
ance and standard deviation of unity. The second Kuder- 


n 
Richardson formula does not use the term pq which sums up 
the individual item variances over n items. Instead, for simpli- 
fication, it uses the average of p’s and q’s multiplied by n, the num- 
ber of items. This term could be written as M(n—M), where M 
is the mean rights and n the number of items. 

Another approach to consistency reliability is through splitting 
a test into two by pooling odd and even items separately and 
recalculating the scores for the half-test for each person. ‘These 
are then correlated and the obtained value is treated as Ty, for 


7 
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half the test. This is then boosted for the full length test by the 
Spearman-Brown prophecy formula which is 


274), 


ty = where 7,, = reliability for half test 


I+Tp, 
tne ss »5 Lull test. 


The general formula for predicting the reliability for any fraction 
or multiple of a given test length is 


Hat 


¢ = Us wehere ¥,,, = reliability of n length text 
fee a ,, unit length test 


and n = any multiple of unit length. 


It will be seen that when n=2, the general formula is converted into 
the split-half case wherefrom the knowledge of reliability for half, 
reliability for full length test is estimated. The difficulty with. 
the split-half method is that the test can be split into equal halves 
in a number of ways and the odd-and-even is not a unique solution 
mathematically. From this point of view the Kuder-Richardson 
formula is better specially as Cronbach (1951) has shown that 
in the form of his generalized coefficient of Alpha it gives the mean 
of all possible split-half coefficients and is a lower-bound estimate 
of the common factor variance among the items. Loevinger 
(Loevinger 1947) has suggested H as an estimate of homogeneity 
of items based on the dispersion of the test and dispersion of items 
which correlate completely or not at all. 

The equivalent forms solution of reliability is applicable to 
both speed and power tests. If the forms are alike the correlation 
between the forms will be fairly high but it is always less than 
for the same test in the test-retest situation which gives the stability 
coefficient, and also less than the consistency coefficient because 
in it the examinees are not affected by the diurnal variation. 
Equivalent or parallel forms of tests are not absolutely identical 
but they are sufficiently alike to be inter-changeable. The follow- 
ing considerations are to be borne in mind in producing parallel 
forms of tests: that the same functions must be tested; this condi- 
tion refers to the similarity of content and form of items; the distri- 
butions on the two forms should be of the same shape roughly and 
the means and the standard deviations should be also approxi- 
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mately equal. A statistical criterion of parallel tests has been 
developed by S. S. Wilks (Gulliksen 1950) but is of use only to 
the more advanced worker and applies only when more than 
two tests are being considered—a situation rather rare in ordinary 
research. Parallel forms can be developed as sub-tests of a longer 
test by plotting all items on a graph paper one arm of which repre- 
sents , proportion passing item, and the other r, the item-test 
correlation. Items falling in the same area of the plot can 
be assigned to the parallel forms as having equal difficulty and 
consistency (Gulliksen 1950). Such sub-tests can be tried out 
on random samples and have a fair chance of giving means and 
standard deviations close to each other and tolerably good correla- 
tions. The criterion of parallel tests can be applied to such sub- 
tests. It may be noted in passing that parallel tests do not yield 
a reliability in principle different from the consistency reliability, 
one form of which is the split-half already discussed. In both 
cases correlations between pools of items justify the belief that the 
forms are measuring identical functions. 

The test-retest procedure is used in case of speeded tests where 
difficulty of items is no serious matter and memory effect or practice 
is not likely to affect the performance materially. In this the 
same test is given after a time lapse and a second score is obtained. 
This is correlated with the first score and the result is the coefficient 
of stability because it shows whether over a period of time the 
examinees have retained their relative positions. The scores on 
all second trials on the same or a parallel test are likely to show 
a small gain but if the positions of persons remain the same the 
test should be considered reliable. 

Basically, reliability is concerned with the error component of 
obtained values. Conceptually, obtained variance of scores is 
made up of ‘true’ variance and the ‘error’ variance. Symbolically 


0 = 0% +02 where o? = obtained variance 


= ‘true’ variance 
= ‘error’ variance. 


Dividing through by ao we get the proportions of true and error 


ot 
variance : J = $+ — 
7% 
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00 foe 
or “> —e snag Ss 
os 
0 


Reliability is here defined as the ‘true’ component of of . The 
2 2 


irae a xd 
term —; in the above equation 1s equal to WW where d is the 
Fo 
difference between the two scores obtained for the sake of reliability, 
the term itself being the variance of difference scores or simply 
the error variance (Rulon 1939). 7, is the proportion of ‘true’ 
variance whereas a/T ‘ known as the index of reliability is the 


correlation between the obtained scores and the ‘true’ scores. 
Error of Measurement 


The reliability coefficient is ‘a pure number’ and always a 
fraction of one. It gives us the reliability of the test as a whole 
but tells us nothing as to how an obtained score is to be judged. 
For this a more practical measure derived from the standard devia- 
tion of the test and its reliability coefficient is used. It is known 
as the standard error of measurement and is given by the formula 


ace On/1—ry, 

where o, = standard error of measurement 
o, = standard deviation of the test 
1, = ‘reliability of the test. 


The standard error of measurment is used to indicate the margins 
of error around true scores, e.g. for 0, = 12, 7,, = °91, thest. error 
of measurement will be 3.6. A true score of 40 can be shown 
as having twice this size of margins of error at approximately 
5 per cent level of confidence. Thus there is a 5 per cent chance 
that the examinee with a true score of 40 will have an obtained 
score of 


40 + 7.2 
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This means that his score will not exceed 47 or fall below 33. 
Lord has proposed a simpler, less restrictive formula for the 
standard error of an obtained score (Lord 1955, Varma 1962) viz: 


bin 
7, <V 22) 


n— | 
where t, = test score of examinee a 
n = Number of items in the test. 
Validity 


A test is said to be valid when it measures what it purports 
and claims to measure. Validity is established by correlating a 
test with its criterion. The criterion is some independent valuation, 
assessment or measure of the same attribute as is measured by 
the test. This criterion is available in many forms ranging from 
the rating of a foreman to a scholastic mark or a teacher’s estimate 
of a student’s standing in some trait. Test construction and 
factor psychology have reached a point of sophistication where a 
standard and recognized test may serve as a criterion for another. 
The psychometrist’s understanding of the test items and what 
they measure is so good now that in experimental situations he can 
regard the Kuder-Richardson consistency estimate of reliability 
as a type of validity in itself, i.e. the self-justification of a test 
that it is a homogeneous measure of a hypothesized trait or ability. 
This is the form of validity known as the construct validity. It is 
concerned with indicating the psychological functions which 
determine the performance onthe test. Cronbach’s coefficient 
of Alpha is a value which furnishes information regarding the 
internal structure of the test. 

The ‘technical recommendations’ of the Cronbach Committee 
(Psychological Bulletin Supplement 51, 2 Part 2, 1954) distin-— 
guishes apart from construct validity three others. 

Content Validity is concerned with the coverage of the area to 
be measured. If a general ability implies a variety of functions 
then a test would be as valid as the proportion of functions 
tested by it. If all the essential aspects of a type of activity are 
measured good coverage may be obtained and good validity 
may be expected provided fair samples of each are included. 
Concurrent and predictive types of validity refer to the agreement of 
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the test with an ‘external criterion available immediately or at a 
later date. Thus if test results are correlated with the teacher’s 
estimates concurrent validity will be obtained. On the other 
hand if there is a time lag in the maturing of criterion values as 
in the case of examination results which are available at the end 
of an year, the test may be treated as a predictor of those results. 

Validity is thus a concept related to certain aims which the 
test seeks to achieve. -To the extent the specified aims are achieved 
validity will be good. That is the most general and acceptable 
description of test validity. 

Validity is sometimes expressed in the form of expectancy charts. 
These show the criterion on the X arm of the scatter diagram 
and the test scores on the Y arm with frequencies in the cells. 
These could be shown for better comprehension as percentages 
so that the marginal totals for score intervals add to hundred. 
Then for each test score obtained a prediction in percentages 
can be made on the basis of the percentage of persons in that 
score interval and the criterion categories. For example: 


2 _ Criterion 
Test Score | ] . 3 4. 5 
90—99 ie i 70 20 ae 
80—89 | 5 15 65 15 4 108 
70-—79 | 20 70 10 100 
60—69 | 10 50 40 3 100 
5S0—59 90 10 100 


This kind of device is frequent in reporting validity on job 
success but is more a method of prediction than a summary index 
of validity. | 

Another approach to validity is via what 1s known as the extreme 
groups method. ‘Those who are recognized as very good | on the 
criterion are separated from those who are very bad. The 
means of the test score of both groups are compared. If the 
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difference is significantly in favour of the superior group the test 
is valid. This is a procedure very much like finding out a biserial - 
correlation between the criterion contrast groups and the continuous 
test scores for the entire group. 


Factors affecting Validity 


It is obvious that if the test and/or criterion are/is.unreliable 
the correlation between them will partly suffer on this account, 
quite apart from the fact of its true intrinsic size. If a wholly 
reliable test and criterion could be found the correlation between 
them will be an exact value expressing a relationship between 
a true score and a true criterion. But things are never this good. 
Unfortunately the reliability of the test is generally known whereas 
that of the criterion is most often not available. In such a situa- 
tion, if we are prepared to regard the criterion as wholly reliable 
and are, therefore, willing to err on the side of caution and under- 
estimation, the obtained validity coefficient can be corrected for 
the unreliability of the test, by the formula: 


Voy 


Or V ron 


where r,,= the corrected validity coefficient i 
ay = the obtained validity coefficient 
142 == the reliability of the test. 


Tox Sives us ani idea how good the correlation ‘could be if the 
test had a reliability of 1, the criterion being assumed to be wholly 
perfect in that respect already. It is similarly. possible to correct 
the obtained validity for unreliability of the criterion, assuming 
the test to be perfect, when Ty takes the place of r,, in the 
denominator. But if the reliability of both is known then the 
formula becomes: 
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where r,,, = the corrected validity coefficient Ty and aa 
Ty, the two reliabilities. r,. expresses the ‘true’ degree of 
relationship between test and criterion. It may, however, be 
noted that this is purely a theoretic boost to the obtained correlation 
between the test and its criterion. It is a theoretic argument 
rather than an actual practical gain. 

Since the reliability has been shown as increasing with test 
length it is obvious that the validity of a test would depend partly 
on its reduced or increased length. This increase in a validity 
coefficient due to increase in test length can be estimated by the 
formula: 


Winx) 4 ae 
= mf 
n 
where Tyg) = the increased validity for n length test, 
y being the criterion 
n = the number of times the test is lengthened. 


The above equation can be solved for n also, given the desired 
validity that is to be attained. The formula then reads 


po l—r,. 
72 
ty __y» 
oo be 
Yy(nx) 


Validity sometimes suffers because for short range of variables 
good correlations cannot be obtained. Consider the scatter 
on a bivariate surface given on page 105. 

It is obvious that the 7,, for the plot with range of values from 
20 to 50 in Y and 30 to 70 in X will be much less than that for 
the plot with full range of 0 to 80 and 0 to 100. In predictive 
validity, a portion of the total number of applicants who are given 
the test are not taken in and the selection deprives the tester of 
the persons belonging to the lower range of the criterion and the 
test score.. Next, of the remaining many drop out during the 
period of maturation of the criterion values (i.e. the completion 
of a course of training spread over a period of time). Thus a 
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0 30 Xx 70 100 
EFFECT OF RESTRICTION OF RANGE ON r 


much smaller group of persons emerges finally with criterion 
values complete for the correlation between the predictive test 
score and the criterion to be calculated. This attrition in NV 
due to time lag is a formidable source of restriction of range of 
ability and consequent reduction of the validity. If the entire 
batch of applicants who were given the selection test had been 
put into training the greater range of ability would have improved 
the validity. A number of corrections of the validity coefficient 
(or for that matter any correlation between two variables one 
of which suffers restriction of range from selection) are possible 
considering the nature of the restriction that occurs. 

In our case the test was given to all applicants but Try 3S avail- 
able for the selected ones only who were put to fdisine on the 
basis of the test. By the following formula the validity for the 
entire group can be estimated: 
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= o for entire sample 


a, =o for restricted sample. 


The complications which arise with changes of the selection variable 
(which restricts the range) lead to modifications of this formula 
to meet those altered situations. ‘Selection variable’ here refers 
to the variable providing the basis of explicit selection; incidental 
selection (with consequent restriction of range) will automatically 
take place on all other variables correlating well with the selection 
variable. Effects of selection have been considered comprehensively 
by Gulliksen (1950). Thorndike (1949) gives a simpler account 
for some special cases. 


JVorms 


Norms are the criteria of normality derived for a group in the 
light of which individuals can be judged. The term ‘norm’ is 
sometimes distinguished from ‘standards’ as meaning the levels 
of behaviour which actually exist in a group as against standards 
which represent levels that are regarded as desirable of achieve- 
ment for the group. Standards would normally be meaningful 
in scholastic attainment where certain objectives are aimed at 
by the curriculum and the instruction. In ability tests, however, 
norms are set by the levels of performance actually attained by 
the population in question. Three types of norms are commonly 
used and recognized. The first is the percentile norm. In this 
test scores are calculated for every ten per cent of the sample 
beginning with the lowest. Thus the 10th, 20th, 30th, 40th, 
50th, 60th, 70th, 80th, 90th and 100th percentile scores are shown. 
These are sometimes known as decile norms. Individual per- 
formance is relegated to one of these segments of the score scale 
and the position of the individual in the group is thereby described 
in terms of Percentile Rank, e.g. it could be said that he has 60 
per cent of the population below him and is somewhat better 
than the median person. 

When the population is not homogeneous but covers a wide 
range of scholastic attainment or age, norms related to those 
ranges are evolved and are known as Grade (=class, in India) 
and Age norms. Here each grade and age is represented by its 
central tendency and sometimes a few other points on the scale 
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and individual performances are referred to them for comparison. 
It is open to the test makers to use “scores in place of deciles 
around the central tendency of each grade or age group. A type 
of norm somewhat different from these is that associated with 
the probability of success on any job or course. The obtained 
score is referred to an ogive curve of percentiles such as shown 
below which gives the percentile rank of the individual - 


1.00 


Pad D 


-50 


225 


SCORES Ps 


PERCENTILE POSITION OF PERSON P. 


person P; has 75 per cent of the persons below him. If the crite- 
rion has a critical value below which a person fails then by the 
use of the principle of regression (discussed later) a score can be 
established below which the probability of success is low, This 
actually is less a function of norm than of prediction but tests 
are so invariably associated with the predictive process that 
norms of this kind are meaningful in certain situations. 


Score transformation 


The raw score is wholly arbitrary as to n, the number of items 
and its mean or median. On this account tests are not compar- 
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able by the raw score. To achieve a degree of uniformity for 
them various types of transformations are recommended. The 
most useful and familiar transformations are the percentile and 
the z-scores. Percentiles are close together in the middle ranges 
and widely scattered at the extreme ends of the score scale. This 
is due to the small frequencies in all distributions at the ends and 
large frequencies around the median, so that every nth person 
takes one over fair distances at extreme limits and confines one 
sometimes to the same interval in the middle ranges. This will 
be clear from the following diagram. | 


Scores | SS a a adn eS as Ce Ss 


Decile | Ey eee Cen (eee Leia ris We ee 


FIG. 8 


Percentiles are frequently blamed for unequal runs between 
norms but are a very realistic score by themselves if you do not 
proceed to subject them to further mathematical grinding. By 
themselves they indicate quite lucidly and vividly the position of an 
individual in the crowd to which he belongs. ‘The z-score has equal 
base line differences and its defects of having negative and fractional 
values are quickly removed by a simple transformation equation 
of the following kind: 


¥-60.. X,—M 


10 o 


where Y;, is the revised score equivalent of raw score, X;, M and o 
are Mean and Standard deviation of the raw scores. The above 
equation can be solved for each value of X and the resulting 
distribution will have a mean of 50 and aco of 10. This is a 
slightly old-fashioned transformation by now. ‘The Americans 
used the Stanine scale during the last War. Its name is derived 
from standard units and a range of nine steps. The ‘standard nine’ 
has a mean of 5 and a unit of 1.98 o. It ranges from | to 9, 
0 and 10 at the two tails having been absorbed by 1 and 9 
respectively. 

All these transformations unfortunately leave the shape of the 
raw score distribution untouched. McCall, therefore, proposed 
a T-score system which readjusted the scores so that frequencies 
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were normally distributed in between. This is a radical proposal 
and the fact should be emphasized. The T-score normalizes an 
obtained distribution. Samples even from normally distributed 
populations are never strictly normal. If for further use to be 
made of the distribution one feels justified in removing the irregu- 
larities of an obtained distribution, T-score transformation can 
be made with the help of ready-made tables (Guilford 1950), 
A final type of transformation worth mentioning is that of ranks 
converted into scores on a 100 point scale. The rank position 
is changed to per cent position by the formula 


Per cent position = a 


where FR is the rank of the person among WV individuals. This is 
referred to a ready-made table and the position transferred to a 
score as out of 100 (Hull 1928). Situations in research can arise 
when ranked data needs to be expressed in scores. Where norma- 
lity of population can be assumed the above transformation should 
apply. 

_ In test construction various specific issues arise with each stage 
of the work. Relevant literature should be consulted for the 
detailed solutions. D. C. Adkins’ Construction and Analysis of 
Achievement Tests (1947) is a good text for beginners. For more 
comprehensive treatment of topics E. F. Lindquist’s Educational 
Measurement (1955) may be consulted. 


IX FACTOR ANALYSIS AS A METHOD OF RESEARCH 
What is Factor Analysis 


The application of factor analysis to psychological data is 
associated with the names of Spearman (1926), Thurstone (1935, 
1947), Thomson (1939) and Burt (1940). Asa method of investiga- 
tion it has come to represent a whole sphere of work which has 
been rapidly covered by an enormous output of expert literature 
and some of the ablest minds in the field of psychology have given 
to it their concentrated attention in recent years, so much so that 
the factorist is recognized by psychological organizations. and 
learned associations as representing an area of specialization 
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different from all others. Unfortunately from the very beginning 
factor psychology has been associated with esoteric if not cabbal- 
istic practices, which on account of the formidable sets of numbers 
with decimal fractions have tended to keep away the less adven- 
turous. Uninformed criticism has led here and there to a mild 
sort of prejudice against the method as such. It is obvious that 
persons who have never carried out factor analysis are not qualified 
to criticise it and such criticism should be put aside in favour of 
a direct approach to a technique that is of fundamental value in 
psychological and educational research. ‘Factor analysis’, 
according to Thurstone (1950), ‘assumes that a variety of pheno- 
mena within a domain are related and that they are determined, 
at least in part, by a relatively small number of functional unities 
or factors.’ If the human mind or any human work or situation 
is structured at all then factor analysis is an attempt to descry 
the underlying structure. In Kendall’s scheme (1950) factor 
analysis is classed as a type of ‘inter-dependency analysis’. ‘The 
start in this method of scientific investigation is made with an 
observed set of relationships, and factors anterior to and under- 
lying those relationships are uncovered and identified. This 
attempt to derive deep-lying fundamental and naturally fewer 
entities from the observations of what is in the main surface pheno- 
mena has been figuratively described by Thurstone as an attempt 
‘to lift ourselves by our bootstraps’. In short factor analysis is 
a general method par excellence in science of identifying and at 
least tentatively describing the structure of a domain from an 
experimental observation of surface signs and trends of the 
phenomena of that domain. The suspicion against it is an 
unjust one and due largely to ignorance and unwarrantable 
apprehensions. : 

Factor analysis begins with the inter-correlations of a number 
of tests. Tests correlate because of underlying functional unities. 
They are to be thought of as pencils or vectors in a three dimen- 
sional space. ‘They are measured in standard units (having thus 
standard deviations and variances of unity) and are tied together 
at the means which are for such units 0. 

In figure 9 we have six tests so ordered into a sheaf of 
vectors. The correlation between any two such tests is equal 
to the cosine of the angle between them. In a _ right-angled. 
triangle: Cos = Base/Hypotenuse. 
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A SHEAF OF TEST AXES 
WITH A CENTROID 
Fig. 9 
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This is illustrated in the following figure. 


Base 


Obviously when the Z @ is 0 the base and hypotenuse coincide 
and the cosine is 1, or the maximum possible correlation between 
tests represented by vectors A and B. When /Z @ is 90°, the 
base is 0 and the cosine or correlation is also 0. The task of the 
factorist consists in introducing in the cluster of these tests a 
mathematically derived imaginary pencil (coincident with the 
arrowhead in the figure). This is the balancing point or the 
centre of gravity of the pencils of tests, the angle between this 
artificial vector and any of the others representing the degree of 
correlation between the two. This correlation is the ‘loading’ 
or ‘saturation’ of the factor in the tests. The factor is the common 
element in all the tests and ‘explains’ some of their inter-correlations; 
and a test’s correlation with it shows the extent of its share in 
the common property represented by the factor. A correlation 
tells us to what extent two tests contain common elements. The 
factor gives us the elements which are common to a set of 
tests and thereby reveals the very ground work of their mutual 
relationships. 


The Spearman Type of Factors 


Spearman started with a box of inter-correlations and 
discovered that they tended (except for minor variations) in the 
cognitive area to obey a principle of proportionality. One 
such small ‘hierarchical matrix’ of correlations is shown 
below : 
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Matrix of Inter-correlations among 5 Tests 


I eee 4s 
I 
1 72 63 54. 45 
2 72 56 48 40 
3 63 56 42 35 
4 54. 48 49 30 
5 45 40 35 30 


The law of proportionality requires that the cross multiplication 
of any four cells forming an oblong or square (€.g. Ty) 743 and 
"52 53 OF 14 15 and 7y4 75) gives equal results. Thus 


4X A0=.48 x .45 or .04x .40=.72~ .30 


Tis kind of cross multiplication of entries in four cells at the ~ 
corner of an oblong or square is called a tetrad. In a hierarchical 
or proportional matrix tetrad differences are 0. So that, 


04 xX .40—.48 x .45—0 
and 12 & b0=-,05 x .40—0 


A mathematically determined imaginary vector for such a cluster 
of tests will show that whatever correlations have been observed 
are due to the existence of common elements among the tests 
involved, in a descending order. These common elements 
constitute the imaginary factor which probably stands for a 
functional unity in the mental processes tapped by the tests. It 
will be seen that the diagonal cells are empty. We cannot insert 
the reliability there because the reliability of a test is its self. 
correlation, and the self-correlation may be due to elements peculiar 
to the test which it does not share with other tests, as well as due 
to the factorial elements which it does share with other tests. 


8 
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We ought to insert there the ‘common factor variance’ of the 
test known as its ‘communality’. In the case of the ‘hierarchical’ 
matrix we can find the communalities by means of the tetrad 
principle e.g. we take the tetrad 


' where x is the unknown communality of test 1. This solves to 


63 X .72—.56x=0 


ores 0X .03.40s 8 
and 1, 63.72 = Ol 
06 


In this manner all the five communalities are calculated and 
they prove to be .81, .64, 49, 36 and .25. In such a situation 
if we calculate the first factor by the centroid method (to be 
presently described) we shall get a factor common to all five tests, 
such that it explains the entire set of correlations. This result 
would-prove that a single factor is responsible for the correlations 
which exist between tests, and since nothing else is common 
between them, each test is divisible into a component that is 
common and another that is unique to itself. This is the two- 
factor theory of Spearman. 


Thurstone’s General Factorial Theorem 


But of course such harmonious results are never obtained in 
experimental situations. Some of the departures of tetrad dif- 
ferences from zero could be regarded as due to random fluctuation 
in the values of inter-correlations. If a single test broke the 
hierarchy it was considered an intruder and out of place i.e. not 
belonging in a functional sense to the family of tests which con- 
stituted the matrix, and was for this reason expunged, and the 
single general factor was calculated from the ‘purified’ battery. 
In fairly mixed batteries of tests the correlations cluster in groups 
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which if factorized will leave over some unexplained correlations 
in the matrix. These revealed grouping by negative and positive 
signs and seemed to suggest that there were secondary linkages 
among certain groups of tests, which the general factor extracted 
left untouched. Thurstone, therefore, proceeded to generalize 
Spearman’s factor theorem so that it would not commit one to 
the single general factor demonstrated by Spearman. Thurstone 
proceeded to extract further factors which were common to 
different sub-groups of tests. The residual matrix after the first 
factor correlations had been removed was found to sum algebrai- 
cally to zero. Thurstone got over this by reflecting test vectors 
Showing minus signs. This meant that the signs of the cell entries 
for both the row and column of the test were reversed. In this 
way the matrix of residuals was got ready for a fresh round of 
calculations giving a second factor showing different sizes and 
signs of loadings. The problem of communality was tackled afresh 
each time a new factor was calculated. An example of this 
kind is worked out on P. 116-7 which should be followed step 
by step. 

In applying the centroid type of factorization two problems arise 
which have been slurred over here somewhat deliberately. One 
is concerned with the insertion of communality or common factor 
variance in the diagonal cells which represent the self-correlation 
of the test to the extent it is due to the common factors. The 
general practice is to put in the highest correlation of the column 
in the diagonals. The obtained common factor loadings should 
be squared separately and summed and again inserted into the 
cells for what is known as an ‘iteration’. This ‘means that a 
second round of the same calculations js gone through with the 
revised communalities. After a few iterations the communalities 
‘stabilize’, i.e. they do not show any great changes, when the 
last round is gone through and final values of factor loadings are 
calculated. The second problem concerns the decision regarding 
the abandonment of further factoring. This is the problem of 
significance of residuals after a number of factors have been taken 
out. A number of empirical formulae and criteria have been 
proposed by Tucker, Coombs, Burt (Thomson 1951) and 
McNemar (1942). These are adequately stated by Thomson 
(1951) ©.g. McNemar continues to factorize residual matrices 
until 
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Tests 
1 2 3 4 5 
1 63 54 36 45 
2 63 51 46 35 
3 54 51 s 42 30 
4 .36 46 42 .20 
5 45 35 30 .20 
mr 1.98 1.95 177 As. ae 
?, 63 63 54 46 45 
Poe (Ss e,47,) 2.61 258 231 190 = 195 
Tk 7816 = 7727 = 6018 - .5090. sagt 
VuT, Scar 
.7816 6109 .6039 .5407 .4447 4096 
7727 6039 .5971 .5345 .4397 .4050 
-6918 5407 5345-4786 ~— «3036 ~— 3626 
5690 4447 4397 3936 .3238 ~—.2982 
5241 4096 .4050 .3626 .2982  .2747 
0191  .0261 —.0007 —.0847  .0404 
0261 .0329 —.0245 .0203 —.0550 
—.0007 —.0245 .0614  .0264 —.0626 
—.0847 .0203 .0264  .1362 —.0981 
0404 —.0550 —.0626 —.0981 1753 
.0002 —.0002 0  .0001 0 
0847. .0550 .0626 .0981 .0981 
2366 .1809 .1768 3276 .3542 
fe ie) (=) 
0847 —.0261 .0007 .0847 0404 
(51° 0961 5500 .0245. .0203 .0550 
(—) | 0007  —245 0626 0264 0Gaa 
(—) | 0847 0203 .0264  .0981 .0981 
.0404- 0550 .0626 .098) .0981 
1844 0797 .1278 + «=©.3276 ~—«.3542 
.1779 0769 1233. «3161 ~—-.3421- 


Sum of columns 
without h2 


Highest r; of column 


11,15==3.339]2 


Ist Factor Loadings 


Ist Factor Matrix 


lst Factor Residual 
Matrix 


Check : Matrix 
sums to zero 
Diagonal entries re- 
placed by largest f; 
of column 

Sum of columns, 
with new diagonals, 
disregarding signs, 
Signs of Tests 2, 3, 4 
changed to make 
column 5 with larg- 
est total of .3542, 
all positive 


1.0737=d .03622 
2nd Factor Loadings 
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Tests 


] 2 3 4 5 


iia = 0/09 7.1233 SIGE 2 Saat 2nd Factor Loadings 
1779. .0316 .0137 .0219 .0562 .0608 
0769 =.0137 = 0059'S «.0095-—s«0243—Ss«.0263 
1293 0219. 0095. 0152 -.0389. 0424 2nd Factor Matrix 
3161 .0562 .0243 .0389 .0999  .108] using temporary 
otek. OOUS 5s = 0421 ~~ ~«.108T 1190 signs 
0531 —.0398 —.0212  .0285 —.0204 
—.0398  .0491 —.0340 —.0040  .0287 
—.0212 —.0340 .0474 —.0125 0205 2nd Factor Residual 
0285 —.0040 —.0125 —.0018 —.0100 Matrix 
—~UZ0- = ror .0205 —.0100 —.0res 


eee 


.0002 0 .0002 .0002 —.0001 Check-Matrix sums 
to zero 
Tests Factors Variance Communa-| Specific | Total 
I II I II lity Variance 

1 -7816 .1779 6109 .0316 .6425 3575 ] 
2 of lad —.0769 5971 .0059 .6030 .3970 1 
3 .6918 —.1233 .4786 .0152 .4938 .9062 ] 
4 .5690 —.3161 3238 .0999 4237 5763 1 
5 0241 3421 2747 .1170 3917 | .6083 ] 


Communalities hy iteration compared with highest 7 of column 
used as first approximation: 


Tests 


63 63 54 46 45 Highest r, of column 


wr MORERRE reese NI ice og 


.6417 5893 4707 4598 -3855 h? after three iterations 


118 EDUCATIONAL AND PSYCHOLOGICAL RESEARCH 


0; 
(Min) <4/H 


where o, is the standard deviation of the residuals after s factors, 
being given by 


Ges 


US /xa2 


where r is any residual correlation taken only once for each test 
with the sign intact and My: =the Mean communality for s 
factors. 

This method of factorization is known as the Centroid and is 
possibly, with the exception of the cluster analysis, the simplest 
as to actual procedure. There are several other methods of which 
the Principal Components of Hotelling (1933) and Maximum Likeli- 
hood of Lawley (1940) are noteworthy for a student of factor psy- 
chology. These have been very lucidly described by Thomson 
(1951) whose book should be consulted for details of procedure 
and the basic rationale. The Cluster Method (Tryon 1939) has 
also been briefly treated by him. 


Ihe Problem of Rotation of Axes 


When two factors are found as in the example above, the rela- 
tions of the tests and the factor axes can be repress graphically 
as shown. on page 119. 

In this figure some of the test points occur in the lower right 
hand quadrant of the graph paper because they carry minus load- 
ings in Factor II by reason of the sign changes in those test vectors. 
And secondly the factor axes I and II are at right angles to each 
other being uncorrelated and statistically independent. Such 
factors are known as orthogonal to distinguish them from oblique 
ones which show correlations of various degrees. It is mathema- 
tically possible to shift the factor axes to new positions using the 
origin 0 as a hinge. If the right angle between original centroid 
factor axes is held and the two arms rigidly moved upwards and 
left or downwards and right we have an orthogonal rotation. When 
the arms representing the factors are moved so that the angle 
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between them departs from its orthogonal position, we get an 
oblique rotation. There is some controversy as to whether rota- 
tion should be made. 


I 


GRAPH SHOWING TEST POINTS ON FACTOR AXES 


It will be remembered that the factors are like averages and 
are derived from the matrix of obtained correlations. They 
represent the central stick of an umbrella with the needles sticke 
ing out around it, each representing a test. The school of thought 
which favours rotation points out that the obtained factor axis 
is a mere ‘average’ and represents nothing more than the common 
elements among the tests on which it is based. It certainly does 
not represent any identifiable psychological entity. They think 
that on rational and empirical judgement the tests offer a much 
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more meaningful and concrete example of mental functions in 
operation. If then the axes are shifted to positions which can be 
given a clearer, purer psychological meaning in the light of the 
tests, they can acquire tentatively a psychological status and can 
be evaluated as a function that underlies the actual experimental 
tests. Another argument for rotation is based on the fact that 
in factors, after the first, tests show minus signs. It is against 
the known economy of nature that there should be any mental 
traits which are favourable to achievement in certain activities 
and positively hinder success in others. This result is due to 
Thurstone’s reflection-of-axes procedure and has necessarily no 
reference to real psychological entities. Those who do not favour 
rotation have actually carried out an analysis the utility of which 
is very much in question. The present writer is of the view that 
all factorization aims at simplification of explanatory concepts 
in the sphere of achievement and ability, and once committed 
to analysis we must go the whole hog to secure a better under- 
standing, if not of the abilities of man, at least of the nature of 
the experimental tests. If one is not willing to rotate on any prin- 
ciple whatsoever one had better not factorize at all. Because the 
factors themselves are mere mathematical artifacts both as to 
signs they carry and the size of the loadings. The only thing 
completely determined by the data and invariant in the situation, 
is the complex web of mutual relationships among tests in reference 
to the common factor axes. The basic real thing in the analysis 
is the scatter of test points on the graph paper.. These have come 
to occupy their places by reason of the two common elements 
(factors) they differently shared, which account for their inter- 
correlations in good part and are represented in the arms of the 
factor axes I and II. Factors are unknown ‘centres of gravity’ 
and acquire definition and psychological meaning in the light 
of the known and concrete tests. Without rotation, therefore, 
they remain mere mathematical entities of no more than theore- 
tic interest whereas the real purpose of analysis is to identify unitary 
if not also primary and basic mental functions. _ 

The trouble about rotations is not their justification but their 
unique solution, for, the researcher is wholly free to rotate them 
to any position he likes. Thurstone (1935) has furnished a cri- 
terion of good rotation in his concept of simple structure if our aim 
is to define factors uniquely; this demands ‘a rotation which will 
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leave every axis at right angles to at least as many tests as there 
are factors, and every test at right angles to at least one axis (Thom- 
son 1951). ‘Being at right angles’ means that the loading for 
the test in the factor is zero; the requirement, therefore, is that 
in the rotated position: (1) No test will include in its factorial 
structure all the factors, i.e. it must miss out in at least one factor; 
(2) No factor will show a loading through all the tests, i.e. there 
should be no general factor. Thurstone held that the ‘simple 
structure’ concept is in agreement with the principle of parsimony 
in explanatory hypothesis. The matrix on page 122 illustrates 
the principle of ‘simple structure’. 


Even so rotation to unique positions is not mathematically 
possible. The only justification is the insight and intuition of 
the researcher or the confidence we may have in a particular 
test of a known function to use it as a point of reference. By pas- 
sing the axis of one factor through the point of such a test we satu- 
rate it entirely with that factor to the exclusion of the other factor 
which now has a zero loading in it. Such a rotation is illustrated 
in the figure on page 122. | 


Thurstone is willing to move his factor axes to. oblique positions 
to achieve his ‘simple structure’. In doing so we create a situa- 
tion where the factor loadings are no longer the correlations bet- 
ween factors and ‘tests (=the cosine of the angle between their 
vectors). We also create a new kind of axes known as the primary 
vectors which are not the same as the rotated factor axes known 
as ‘reference vectors’. This departure from the orthogonal posi- 
tion also leads to the factors losing their independence and getting 
correlated to the extent of the cosine of the angle between them. 
Since factors are correlated we can get a matrix of factor inter- 
correlations. ‘This is again factorized and leads to what are known 
as Second Order Factors. It would seem as if in this way we are 
back to the ‘g’ or general factor of Spearman, the primary abilities 
being themselves correlated. The main difference between 
Spearman’s original position and Thurstone’s point of view is 
a difference in emphasis: Spearman by all the means at his disposal 
would maximize a first general factor; Thurstone would maxi- 
mize instead the PMA type of group factors and through their 
obliquity arrive at a minimum of a second order factor like 
the ‘g’. 
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MATRIX EXEMPLIF YING THURSTONE’S SIMPLE 
STRUCTURE 


Factors 


Tests I II III 

1 Z .80 
4 ; 50 50 
3 ./0 20 

4 d 60 

5 .40 50 
6 .70 

I, 


bf 


I, 


ROTATION OF FACTOR AXES TO NEW POSITIONS, I, 
TO LOAD TEST 5 ENTIRELY WITH: FACTOR I 


/ 
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Reversal of Perspective in factor analysis 


DIMENSIONALITIES : P,T,O. 


Factor analysis is a general scientific technique of wide appli- 
cability and a number of novel applications of it have been thought 
of and suggested. In the educational-psychological situation 
three categories of variables can be conceived as interacting. 
These are persons, occasions and tests. In the ordinary factor ana- 
lysis it is tests that are correlated for various sizes of NV of persons. 
Consider the diagram below as illustrating this triple relationship. 


T 


INTER-RELATED VARIABLES OF PERSONS, TESTS AND OCCASIONS 


It is obvious that we can use a population of any one of this 
triad for the WN of our sample. Correlations could be between 
persons or occasions as much as between tests, and the WV could 
be either of the other two. Stephenson (1935, 1953) has proposed 
the use of the correlations of persons for factorization as a systema- 
tic and more realistic approach to the mind of man. Burt (1937) 
has tried to assimilate these radical proposals within the frame- 
work of ordinary factor theory by pointing out that it amounts 
in effect to looking at a score matrix the transverse way and 
“reversing the roles” of persons and measures (Thomson 1951). 
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Stephenson, however, has claimed for his Q-technique the status 
of a new method in psychology and one that is not a mere trans- 
pose of the old R-method: It is credible that a hypothetical 
factor among several persons may be found and may represent 
a sort of personality that is central to the persons correlated. 
Q-technique is of interest to psychologists in the field of personality 
and clinical work, and the interested are referred to Stephenson’s 
book The Study of Behaviour (1953) which outlines it. We can — 
have correlations of occasions, holding persons or tests constant, 
i.e. the same person takes several tests on different occasions, V 
being the number of tests or the same test given to several persons 
on different occasions, WV being this time the number of persons, 
In both cases occasions are correlated. This kind of approach 
is of some advantage in analysing the effect of repetition and time. 
Problems in the area of learning and practice effects could be 
tackled quite ingeniously by this method. It appears at first 
sight that a sizeable WV of tests is rather improbable. Stephenson 
in his Q-technique does not propose the use of tests. He substi- 
tutes for them various statements which, each person or the same 
person expressing himself from different angles scores by sorting 
out into a Q-sort. The sample WV therefore consists of statements 
such as: “TI am generous to a fault”’ or ‘“—keeps going only by vir- 
tue of constant inner struggle” ; and quite substantial Ws of these 
are easily secured. 

The invariance of factor matrices transformed by rotations. 
is one of the more troublesome difficulties of the factorial technique. 
The simple structure of Thurstone is not objectively determined 
and analytical methods which give full determinacy are psycholo- 
gically meaningless. As a result of this weakness the fundamental 
task of factor analysis, viz. that of uncovering the underlying deter- 
minants of surface psychological phenomena, has been rendered 
more or less unattainable. If there is no agreement between 
what is identified as a factor by one worker and another separately, — 
the very objective of factor work is lost. That a fair degree of 
observable comparability exists has been brought out by the com- 
prehensive survey by French (1951) of the results of factor analyses. 
upto the year of his publication. Repeated identifications en- 
courage the hope that the subjectivity influencing rotations is 
not such as would nullify the advantages of insight and experience. 
This satisfaction is a matter of subjective impression based on at 
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best a rough-and-ready count. A more formal approach to this 
problem was made by Tucker ( 1951). 


THE UNIFIED FACTOR MODEL 


Ahmavaara (1954, 1957; Ahmavaara and Markkanen 1958) 
has presented a case for what he calls ‘The Unified F actor Model’ 
which according to him furnishes “an essential and important com- 
pletion of the multiple-factor analysis (as it was developed by 
Thurstone)”. His ‘Unified factor analysis’ comprises three steps, 
the first two being already familiar as due to Thurstone and a 
third which is due to Ahmavaara. These are: (1) Factorization 
(2) Rotation (3) Transformation (Ahmavaara and Markkanen 19584. 
The object of the third step is to synthesize the results of different 
factor analyses. The steps of this procedure of transformation 
have been serially stated by Ahmavaara (1957) and require us 
(1) to find tests common to two factor analyses based on two groups, 
(2) to write out the factor matrices fF, and F, for both (3) to com- 
pute the ‘transformation matrix’? L by the formula! 


ft Or. 


where F, , F, are the two rotated factor matrices, F, is the trans- 
pose of F, 


and (F;F,)~1 is the inverse of the transpose of F, post-multiplied 
by F, 


(4) to normalize? the matrix L by rows 


(5) to perform the matrix multiplication FL and finally 


(6) to compare matrix F, L with F, 


This test is made by plotting the elements on a graph against 
orthogonal axes F,L and F,. This gives a point scatter around 


line X= Y, close scatter indicating agreement of the two 
analyses. 


1For an elementary knowledge of matrix algebra Thurstone’s Multiple Factor 
Analysis (Mathematical Introduction) and H. W. Turnbull’s Theory of Determinants, 
Matrices and Invariants or other texts on Algebra may be consulted. 


"See Thomson, 1951, p. 159, 
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Ahmavaara (Ahmavaara and Markkanen 1958) goes on to 
make further proposals for the study of factorial differentiation in 
analyses by the computation of a ‘difference matrix’ A!. By 
these means we could study the factorial invariance among diffe- 
rent groups. Differentiation indices, total and partial, give a 
detailed and over-all idea of the difference between the analyses 
of two groups. A whole community or group could be treated 
as ‘standard’ and a ‘theory of group differentiation’ of factors 
could be put forward objectively. Another type of difference 
is the remoteness of single test points from the line X=Y on the plot 
ofaxesf’, Land Fy. This is considered by computing a ‘d Matrix” 
which indicates “the magnitude of abnormal transformation in 
each test”. Abnormal deviation of a test would suggest that the 
psychological meaning of the test changes from group to group. 
This kind of technique requires fairly advanced mathematics 
and is open to those who have a good understanding of Thurstone’s 
rationale and familiarity with elementary matrix algebra. Ahma- 
vaara claims for his transformation theory objectivity and mathe- 
matical uniqueness. It may however be noted in passing that 
the transformation process is unexceptionable whereas the rotations 
prior to it are non-determinate as before and do not banish the 
fear that we may be guilty of seeking a fortuitous agreement 
between two cases of mistaken identity. 


Gutiman’s Radex Analysis 


Guttman has already been mentioned for his original and highly 
perfectionist and eminently logical excursus in connection with 
attitude scaling. His Radex concept (Guttman 1954) has received 
enough notice and comment to merit a brief reference at the close 
of this short survey of factor methods. We have considered the 
classification of research data into observations made within the 
three dimensional space of persons, tests and occasions (or trials) 
for purposes of correlation and factorization. It was then pointed 
out that correlations among occasions or periodic trials could 
be a rewarding approach to the study of practice effect and 

fonaomed by the formula A=L,L,— R pq Where L,, L, are the normalized 


transformation matrix and its transpose for group 2 and R’ pq the rotation matrix 
post multiplied by its transpose for group 1. 


Computed by the formula F,L—F, with the aforesaid meanings. 
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learning process. Guttman’s Radex has very good application 
to such a task. Both Guttman and Ahmavaara rationales take 
the student far afield in factor analysis demanding of him a good 
background of statistics and algebra. Jones (1959) arguing for 
the Guttman approach criticizes Thurstone for trying ‘“‘to lift 
himself by the bootstraps”. He quotes Reichenbach’s distinction 
between ‘the context of discovery’ and ‘the context of justification’. 
Generation of hypotheses belongs according to him to the former, 
whereas statistical treatment and finesse belong to the latter which 
provide the testing of hypothesis. Thurstone’s search for simple 
structure amounts, he feels, to generating hypotheses fcom the heart 
of pure mathematical jugglery. 

Guttman and his supporters go back to Spearman at least in 
respect of point of departure. They wish to postulate designs 
in the matrix of obtained correlations and seek to test the validity 
of their theoretic constructs by means of observed data. It may 
be recalled that Spearman postulated his general and specific 
factors and looked for hierarchical matrices, which would justify 
them. Guttman looks for the ‘simplex matrices’ which he regards 
as the ideal configuration for correlations among several trials 
of the same test with the same Persons. During sessions of training 
such periodic tests are liable to correlate systematically in a manner 
productive of Guttman’s simplicial form. The simplex matrix 
gives ‘“‘a sequence of stages each one of which is stacked within 
the next like the sections of a telescope’”’ (Jone 1959). This means 
that the common factor variance is arranged in a_ box-within- 
box like order, so that later tests contain all the factor variances 
_ of earlier ones. This situation is reminiscent of Spearman’s hierar- 
chy but is somewhat different in detail. Spearman produced 
the correlation between any two tests from their ‘g’ saturations, 
Thus 


Rio = Ri, ° Roy 
Guttman similarly determines in the simplex the correlation of 


two tests by a third which mediates between them as to factorial 
complexity. Thus in a simplex 


Vik — Vy ° Tike 


The partial . correlation ‘a2 and 7; are in both cases zero, 
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which means that the entire correlations 7,. and r,, are determined 
by the mediating function or stage ‘g’ and j(i<j<k in the simplex). 
Spearman employed the principle of proportionality in his 
matrices; Guttman replaces this by the principle of contiguity 
which requires that the immediate neighbours of the test show 
high correlations with it. In his theory of scaling Guttman was 
forced to compromise with reality by compounding his perfect. 
scales with quasi-scales and in the case of factor analysis he is 
compelled to admit error components in his simplex-matrix for 
which he provides the quasi-simplicial form. The correlations 
are in a simplex of the form: 


ig = Nig XTitt, 4200+ X Gag  (O<J) 
e.g. in a five stage matrix 
le 19 +793 + oa ae 


A perfect simplex of Guttman is given below as an example of this 
principle: 


A PERFECT SIMPLICIAL MATRIX 


1 2 3 4 <) 


1 40 24 18 .16 Applying the five stage for- 
mula for r;; we get here: 


2 (60. ..45. «41 Tee 0x. 75x 98 
3 75.68 

4 .90 

5 


The simplex thus may be represented as a string with beads strung 
in it for tests. Many such quasi-simplexes can be detected in 
experimental matrices. The correlations as we have seen tend to 
decrease as they move away from the diagonal towards the right 
hand top corner of the matrix. 

A second form diagnosed by Guttman is known as the Circumplex 
in which correlations first decrease from the diagonal but rise 
again before they reach the right hand top corner. The string 
models of simplex and circumplex are represented in the figures 


on spheroidal surfaces on the next page. 
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SAXIIdWNDYID ANY SaxXxaTdWis JO NOILVLN3S3Y¥d3Y TVGION3HdsS 
SIXAIWWNDYID Z S3XIIdWIS Z 


| 
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If the distance between the points is inversely proportional to the 
correlation between tests, it will be seen that the models would 
give rise to the kind of matrices described above. A Radex is a 
combination of the simplex and circumplex type of matrices, as 
suggested in the following figure: 


A RADEX SHOWING 3 CIRCUMPLEXES 


WITH 3 SIMPLEXES 


The radex is no more objectively and uniquely determinable 
than the ‘simple structure’. It is Guttman’s idea that nature in 
certain situations offers us the radex type of correlational con- 
figurations. That is a hypothesis he would use to test the data. 
Both Thurstone and Guttman try to verify and demonstrate an 
underlying design in nature and both methods are permissible 
if proof is forthcoming that the fit of the data and ideal configura- 
tion is good. The student who wishes to apply Guttman’s method, 
to experimental correlations specially in the field of practice effect 
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and learning will find the details in Guttman (1954,1955) and 
Jones (1959). Guttman’s scalogram analysis and the radex type 
of analysis await wide experimentation and verification by research 
workers in education and psychology. 
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THE SPAGLISTICAL MEA ODs Of 
RESEARCH IN -EDUGAPAON AND 
PSYCHOLOGY: PREG 22 On 
AND DECISION PROCESSES 


X MULTIVARIATE DEPENDENCIES AND PREDICTION 
The Nature of Variables 


One of the functions of research in education and psychology 
is that of investigating the inter-dependencies among variables, 
both discrete like sex or nationality and continuous like height 
or mental ability. ‘Knowledge’ said Comte, the positivist 
philosopher, “‘is Foresight, and Foresight is power”. It is an 
advantage to know relationships of things because that enables 
us to surmise the state of what is unknown from related factors 
which we know, on the basis of a probability calculus. Such 
predictive functions of variables have been widely investigated 
and are employed profitably in a variety of situations. Four 
types of processes are visualized by statistical theory (Guttman 
1941, Guilford 1956) : 


Unknown 
(Dependent, Predicted) 


Known 
(Independent, Predictor) 


Attribute in discrete 
categories 


—————— 


i i 


Quantitative, continuous | 
Ag ) 


variate: 


Attribute in discrete 
categories 


Quantitative, continuous 
variate 


ie a ee aN 


Attribute in discrete 
categories 
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In educational and psychological research of the type that is 
concerned with individual differences, the situation frequently 
arises where from known variables unknown ones are to be estimated 
on the basis of relations established in experimental groups. Any 
of the four types of set-up tabulated above may occur and require 
a solution. Systematic treatment of these has been presented 
by Guttman (1941) in a somewhat technical form. It is necessary 
for the research student to obtain a good understanding of all 
such solutions because of their practical value and their recurrence 
as themes of research. 


Prediction of Attribute from Attribute 


Attributes can be predicted from other attributes on the principle 
of maximum probability. Since only discrete classes like sex, mother- 
tongue or relgious affiliation are involved we can deal only with 
the enumerative type of data in several classes. The prediction 
is then made on the basis of the mode or the class in which the 
highest frequency occurs. An example will simplify the actual 
_ procedure which has been ably stated in terse symbolism by 

_ Guttman (1941) and explained with examples by Guilford (1950). 


TABLE OF PRopoRTIONS OF BIVARIATE DISTRIBUTION A; B 


ATTRIBUTE A 


A, A, A, + b, 
B, ie eo 08 23 05 “90 
B, 06 02 04 06 18 
Attribute B B, 04 04 10 04 22 
B, 09 06 05 01 21 
B, 02 05 08 04 19 
ee ae Se tae sa 


Without a knowledge of 4; (i.e. any sub-category 7 of attribute A) 
we will plump for the mode of B viz. sub-category B, which 
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vives a p of .22. But if we knew A; we would then predict the 
sub-category in the given column, in which the mode of B occurs. 
Modes for A, to Ay are : 


Known Attribute Mode in B; 
A, 09 (B,) 
A, . .08 (By) 
A, .10 (Bs) 
A, .06 (B,) 
Total 533 


Thus with a knowledge of A; and the scatter diagram of fre- 
quencies our prediction has improved somewhat. 

Guttman proposes a simple coefficient for a measure of this 
improvement in prediction of B from a knowledge of A, and 
the relationship with 8. Symbolically when we are predicting 
B from A, the coefficient is | 


WY ! 
as je? 


k=1 


coe 


where Pj, is the largest entry in column & summation being over 


all columns of m sub-categories of A and 6 is the largest of b column 
totals. Substituting the values in the example 


33 — .22 


| Seieete S 
| — .22 


o 


similarly the coefficient in predicting A from B becomes 
2 


et YES A 


j=l 
1—a@ 


J 
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and the coefficient comes to 


41 — .30 


Ba ees 
1 — .30 


It may be noted that the efficiency of ra is not the same 
both ways. : 

Similarly we can predict an attribute from the knowledge of a 
set of attributes provided that the distribution of frequencies of 
an experimental group is known. We may be required to predict, 
say, A from B, C’, D, E which are several attributes with sub-cate- 
gories. For a given individual we know his values in B, C, D, E 
associated in the experimental group with A. He would then 
be predicted in the largest sub-frequency of this vector or arm 
of sub-frequencies of A. For example, a person may belong to 
a group characterized by B, C, D; E,. All such persons are 
divided over the m sub-categories of A. The category of A contain- 
ing the highest number of cases or the mode is our safest bet in 
prediction for a case characterized in other attributes as B, C, 
Oe Oop 

be. would here be working with the crudest kind of measure 
viz. the nominal and the resulting prediction process is also equally 
simple. 


Predi¢tion of Quantity from Attribute 


Formulation of solution in a problem of this nature is based 
on the concept of ‘least squares’ viz. that the squared errors of 
prediction are a minimum. If we have an experimental distribu- 
tion and its mean and are then given a case drawn at random 
from the same population our best guess about him would be 
that he carries the mean value in respect of the variable in question. 
The errors of prediction from the mean are a minimum for all 
cases. In the present case our original distribution on the quanti- 
tative variate is subdivided into the several sub-categories of an 
attribute A. Thus we have several sub-samples within the entire 
experimental sample. Let the means of the sub-samples per taining 
to sub-categories of A be M;. Knowing a person’s sub- category 
in A, the prediction on the quantitative variate is the mean of 
that sub-sample. Thus A, category predicts M, as the value of 
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the quantitative variate that is likely to reduce mistakes to a 
minimum. 3 

For example, we may have students categorized according to 
the schools they come from. Then if a Geography achievement 
test is administered to comparable samples, four means for four 
schools A, B, C, D will be available with an overall mean for all 
samples pooled together. About a student who was not a member 
of any of the experimental samples, we know only the school 
from which he comes i.e. his membership of a discrete category. 
Our best guess of his achievement in Geography is the mean of 
his experimental school-sample. The standard deviation of that 
group represents the accuracy of this prediction. The standard 
error of estimation derived from the sum of the discrepancies 
between estimate, (i.e. category mean) and obtained values is 


given by the formula : 
3 Ne 
/ > Me i 
po yt 


Yk N 


where V;, 0; are the V and standard deviation squared for 
group 7, summation being from i to & groups. 

NV is the general WV for the combined groups in all categories. 

Ty,) 18 the error of estimation that occurs when y the continuous 
variable is estimated from x, the school affiliation. 

Guttman (1941) has shown that, when instead of having one 
class of sub-categories (like our several schools above) we have 
several classes of attributes each having sub-categories, no essential - 
change occurs in the predictive process except that the combina- 
tions of the several sub-categories becomes large. Each combina- 
tion has a sub-sample the mean of which is the best predicted value 
for persons characterised by such combination. 


Prediction of Attribute from Quantity 


In this case we know the relationship in an experimental 
sample and know a person’s value on the continuous variate which 
is the predictor. To which group characterized by an attribute 
should he be assigned? Guttman plots the curves for the quanti- 
tative, continuous — variable separately for the several groups 
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characterized by sub-categories of the attribute. The following 
figure illustrates this. 


Ay A2 A3 


QUANTITATIVE PREDICTOR Py Pa 
OVERLAPPING CURVES ON A CONTINUOUS PREDICTOR FOR 3 GROUPS 
WITH ATTRIBUTES Ai Az A3 


Perpendiculars dropped on the base line representing the 
quantitative variable from the points of intersection of the curves 
give the values of the continuous variable which separate the 
overlapping groups with minimum misclassifications. The persons 
obtaining these values can equally belong to either of the over- 
lapping groups. Persons below the point P,; are more likely — 
to be in group A, than group A,. This conclusion emerges 
from the fact that as we get away from P, to the lower end 
of the scale the frequency of A, persons falls and A, persons rises. 
Even so in assigning all persons with value x,<P, to category 
A; we are misclassifying a number of people who belong to the 
Shaded area in the figure above. Similarly several persons will 
be misclassified into A, though they belong, to04,. .P). and. B 
are critical values of the predictor variable which is quantitative 
and continuous. Guilford (1950) gives the following formula 
for the critical value between two overlapping groups characterized 
by differentiating attributes: 


X. ae ( | Te ;) 
oP a 


where M, the mean of the entire distribution of the combined 
groups 


f = proportion of total population in the category having the 
higher mean score on 


= l— fp 
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M, = Mean of X values for category higher on X 
M,= Mean of X values for category lower on X 
o= — variance in the total distribution of X. 


Guilford (1952) makes many ingenious suggestions for presenting 
facts relating to group overlapping and his book will be of much 
interest to those seeking solutions to problems of this nature. 


’ 


Discriminant Function 


When a number of quantitative variables are to be jointly 
employed to predict affiliation to one of two categories, mathe- 
matically a somewhat more advanced method is available to the 
research worker (Johnson 1950, Goulden 1959). This is in fact 
a problem in regression with a dichotomous criterion. For example 
we may have the scores of Arts and Science groups of students 
on three tests. They will produce different typical profiles on 
them if the tests are calculated to do so. From this data we wish 
to be in a position to relegate a new student on the basis of his 
test scores to one of the two groups. In such a situation the 
discriminant function will answer the purpose. The procedure 
is to set up normal equations equal to the number of variables 
using the inter-correlations and dispersions, and derive weights 
known as L’s and J’s. Two limits for the contrasted groups 
are set up by 


“ya = AM, a A,M,, an AM. 


3a 
For a new case, U is calculated by 


Thereafter U’ is compared to Exceeding it he goes to 


hq + Ay 
2 


the second group; falling below it he stays in the first. 


STATISTICAL PREDICTION AND DECISION PROCESSES 139 


The weights in discriminant function are caiculated so as to 
sharpen the difference between the typical profiles of the two 
groups, which facilitates the allocation of a ‘person in. either of 
the two categories. Discriminant.function is a very useful tool 
and has ready application to many problems in education, and 
it is unfortunate that the greater familiarity with and popularity 
of the regression weights have tended to its neglect although it 
serves a situation to which the ordinary regression weights do 


not apply. 
Prediction of continuous variates from continuous variates 


Finally we come to this type of relation which is best represented 
in the Pearson coefficient of correlation. The coefficient of cor- 
relation is however ‘a pure number’ ranging from—l1 through 
0 to+ 1 and has no direct application to prediction in terms of 


original values of variables. Prediction of continuous quantitative 


variables is conditioned by the phenomenon of regression which 
should be understood as a_ prerequisite of the prediction 
process. | . 


Regression 
Correlations are never perfect. In fact chance favours moderate 


sized correlations but thereafter it begins to hinder its increase 
and in the ranges close to —1 and + | chance error interferes _ 


solidly in the attainment of perfect values. When correlation 


between X and Yis not perfect the predicted values of Y will depart 
from the exact point-to-point parallels of X. This shift of Y's 
away from exact parallelism with XY is in the general direction 
of the over-all 7—mean. This phenomenon has been designated 
by Galton, the original user of the correlation principle as a tendency 
of the predicted vaules to ‘regress to the group mean’ consider 
for example the following figure: | 

X and Y are continuous correlated variables and X, and 
Y; are the step intervals. Persons in each interval of X show 
a scatter over some of the related intervals of Y. These are shown 
as a thick vertical line within each interval column of X, with the 
dot in the centre representing the Y—mean of that sub-scatter 
of Y. Thus for each sub-group in X we get a Y mean which is 
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shown from M),, to M,;. The overall mean of Y is Moa aie 
contention of Galton | was that there is, in all correlated measures, 
a tendency for the M,, (i.e. the several y—means) to draw closer 
to this general mean. The line of best fit through these sub-group 
means tends to be on a level with the M, line. This is the fact 


REGRESSION OF Y ON x 


of regression observable in all bivariate scatter diagrams of variables 
which correlate less than —] or + 1. A similar diagram can 
be drawn for the regression of X sub-group means (based on each 
group pertaining to a class interval of Y) to the overall X¥—mean. 
In that case, as shown on the next page, the line of best fit will 
tend to become steeper and coincide with the vertical line repre : 
ting the general mean of X. 


As in case of attributes the prediction of Y from XY and X from Y 
is not equally efficient. +r, the symbol for correlation coefficient 
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Ye 


Ys 


¥3 


Y2 


REGRESSION OF X ON Y 


is the simplest of regression coéfficients when z—scores are being 
used. ‘Thus 


oy te, 
where a = Predicted standard score in r 
Z, = Given standard score in x 
r = Correlation coefficient. 


For all values of z, when r = 0 the estimates in z, are 0 (i.e. the 
mean of y). Again for y=1, z, is the exact equivalent of z,. This 
procedural simplicity is due to the fact that z values iron out the 
differences in means and standard deviations of x and y (z being 


~ 


ee | 


~ 


X- 
defined as a ). Thus X and Y gross values become as z, 


and z, exact parallels and clearly reflect the fact of their inter- 
correlation. Since 7’s are always in experimental populations 


A 


less than 1, the fact of regression of all values of Z,, the predicted 
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variable may be seen by the effect of the multiplier 7 in the above 
équation. For example we have a given value in z, of 2 and the 
r=.5, Then a 

| eee 

, et 


ee person who was two standard deviations above mean of 0 in 
, has come nearer to the general mean of Z, which is 0, by being 
seduced to. 1. 
If we .are using deviations x and y (x=X—M,, y= Y—M,) 
the equation must include the standard deviations. Thus 
A oy 
= ee xXx 


wv 


This is the same equation as for Zy except for the standard devia- 
tion of » being expressed in terms of the standard deviation of X. 

ii-even the effect of differing means is not to be ruled out in 
the values we wish to use in the equation the latter must make 
adjustments: for these also: 


A C, 
yp=y—x , 
£ 
would be written as 
(i oY (¥—_M. :) 


oy 


where (Y—M,) =the estimated y, the bar replacing the y sign of 
estimation. 


Therefore 7 =r —¥ (X—M,)+M, and yoG oe (7—M,)+M, 
| ses 


e Y 


These gross value equations will transform given values of X and 
¥ into equivalent values on the other variable on the basis of the 
known constants 7, ¢,, ¢ y 4, and M,. 

The standard error of estimate of » ficiene is given by the 
formula 


cr ae 
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Prediction of a Quantitative Variate from a Set of Variates 


Where one criterion variable is correlated in turn with a set 
of predictor variables, R, the multiple correlation coefficient can 
be calculated. For the simple case of the single criterion being 
approximated by two predictors the equation for R is 


72 asi 2 __ 2 ae 2 x 
i < . oe © . - . 
R? eae - a ez" "12 (where c¢ stands for the criterion 
—r 
12 


and 1, 2 for the predictors) 


When 7,,=0, the third term of the numerator and denominator 
becomes 0 and the formula reduces to 


For predictor sets of more than two certain /-coefficients are to 
be calculated by the solution of normal equations. This 
can be done by either the Pivotal Condensation method of Aitkens 
(1937) or the method used in America due to Wherry and Doolittle 
(Guilford 1950, Thorndike 1949). The betas can be directly 
used on z-values of the predictors as multipliers. The resulting 
products are summed and will produce a composite in z-form 
which will give the highest possible correlation between the battery 
of predictors and the criterion. ‘The betas are the standard partial 


regression coefficients. In the two-predictors one-criterion 
situation. 
B leis” (ag 
1.2 2 
l—?"p 
and 


In these equations the regression of the criterion on each single 
predictor is found while the other predictors are held constant. 
The multiple correlation between m predictors and criterion ¢ 
is given by the equation: 


Be B44 + Pate ia +» he B in Toen 
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When gross values are being used in equations f’s are converted 
into b-coefficients which take into account differences in standard 
deviations of the variables. The gross value equation is of the 
form 


A 


where Y = the estimated gross value of the criterion 


X, =the gross value in any predictor 7 
b, =the related 5-coefficient 
ko =- a constant 


The é is a constant representing the transepts in the regression 
lines. 

Regression coefficients are the best weights in ‘the least squares’ 
sense, and are very popular in predictive processes based on quanti- 
tative variates like test scores. “The least squares’ solution it may 
however be noted capitatizes on chance fluctuations in favour 
of the obtained R. In subsequent samples therefore R suffers an 
amount of diminution that is estimated by the following ‘shrinkage’ 
formula due to Wherry (1931): 


where RZ = R* corrected for ‘shrinkage’ in being applied to ano- 
ther sample | 


m = the number of predictor variables 
NV = the sample size. 


| This ‘shrinkage’ estimate reverses the effect of the boost ‘the 
correction for attenuation’ gives to predictive validity. 


Inclusion of Tests in the Battery 


‘The Pivotal Condensation gives regression weights successively 
for all numbers of tests beginning with one, the most highly correla- 
ted with the criterion. At each stage the increment in R can be 
quickly estimated. Using the analysis of variance it is then possible 
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to calculate F-ratios. When with the addition of another test 
the increase in R is not significant by 


Rony — Ren 


if , d. f. being unity, 
in terms of the residual mean square (Thomson 1951 p. 208) no 
further test need be added to the existing battery. 

Another solution to this has been offered by Horst (Thorndike 
1949). If we have an existing regression weighted composite 
of n tests and are considering the addition of a new test K’ , we may 
find out by Horst’s formula given below what validity A must 
possess against the criterion to effect a specified increase in R: 


Tot = VeeRoe LV a(a+2R, ,) (1—r,) 
where, a = the specified increase in R,, 


T, =the validity required of test K to achieve the increase ¢_ 
r,, =the correlation of test. K with the composite of n tests 


i, oR “of the battery of n tests. 


Prediction of a Number of Criteria 


If there are several criteria and a general multi-potential battery 
(Varma 1956) the regression weights for each of the criteria can 
be summarily calculated. This requires a reciprocal matrix! 
of the original matrix of inter-correlations of the predictors to be 
calculated by Pivotal Condensation (Thomson 1951). Multiply- 
ing the rows of this reciprocal matrix by the criterion-predictor 
correlations and summing the columns will give the weights for 
each criterion in turn as shown in the example below. 


1 The reciprocal matrix is one which post multiplying its original non-singular 
(|A | 40) matrix produces an identity matrix which has unity in the diagonal 
cells and zeros in the rest. Thus AA-!=I, where A! is the reciprocal or inverse 
of A. (See Thurstone, 1950, p. 23, 46; Thomson, 1951, p. 209.) 


IQ. 
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] l Dodie ia 204, 092 > 200" ae eae 

4 251 Porte 202. 152 — 154° > eae 

3 272.07 Oe I. 135 212 175: 0832087 The. decimal poms 
4 164°) 252) goo 1 143 181 263 157 have been dropped 
5 092. “1S eee Aas 1. 173 O97 082: and are to: bea 
6 299° Ia oe 1B 173 | $15 168° sumed mie 
7 143 —035 083 263 097 315 1 080 matrices. 

8 138. 07728 Gar 357. . 082.” 268:..y080 ] 


RoP INVERSE OF PREDICTOR MATRIX 


LESS 
Tests Tos ] 2 3 -f 5 6 f, 8 


471 1.206 —183 —212 —052 028 —258 -—062 —080 

316 —183 1.238 —294 —269 —077 —082 199 001 
280 —212 —294 1.204 —004 —175 —055 -—042 —026 
117. + —052 —269 —004 1.189 —080 —033 —289 —122 
144 028 —077 —175 —080 1.084 —123 —035 —035 
140. —258 —082 —055 —033 —123 1.249 —324 —124 
034 —062 199 —042 —289 —035 —324 1.199 005 
381 —080 001 —026 —122 —035 --124 005. 1.056 


CO Sa ota a te 


R~' multiplied by 79; (correlation between criterion 0 and test 7) 
and summated for columns will give the regression weights for 
the predictors. The yp; column can be replaced by any other 
criterion-test correlations but the inverse can always be used to 
calculate the weights as explained. In the above example the 
regression weight for test 1 is found thus: | 


Boy= 1.206 x .471-+ (—.183) x .316-+ (—.212) x .280 + (—.052) x 
AL7-+ .028X 144+ (—.258) x .140-++ (—.062 x .034) + (—.080) 
X .381=.380 


The weights for the entire set of predictors can be calculated and 
come to : , 
J 2 ‘eS ee 6 7 8 
.380 .176 .100 —.044 .055 —.068 —.020  .321 
If instead of having a single satisfactory criterion we have a 


set of them giving only partial satisfactions, it is possible to weight 
them as well as the predictors so that a maximum correlation 
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between the composites will result. Hotelling (1935, 1936) has 
proposed for such a case a solution in his concept of ‘the most predic- 
table criterion’, which is based on the canonical correlation between 
_two sets of variables and which is open for use to the more advanced 
worker. Its use in ordinary research is however limited 
as the criteria to be combined to produce a more satisfactory 
composite are rarely amenable to merely mathematical 
weighting. 

When criteria are so close together that differences in regression 
weights for each one severally are not substantial, the researcher 
may be interested in evolving a single regression equation for all 
the criteria. Guttman (1941) has offered a formula for this type 
of ‘the most common regression’. 


Absolute and Differential Weighting 


Horst distinguishes between weighting for ‘absolute prediction’ 
and weighting for ‘differential prediction’. In absolute predic- 
tion weighting the objective is to utilize the predictors to 
‘approximate the several criterion scores as much as possible. In 
differential prediction weights are calculated to produce 
maximum differences among the composites for the different 
criteria. Horst’s monographs (1954, 1955) pick out tests from 
a full battery for the above purposes and also weight them diffe- 
rently for the multiple criteria. Horst (Horst 1956, Horst and 
MacEwan 1956) has evolved formulae for calculating the optimal 
length of tests so picked (in terms o° testing time) for beth maxi- 
mum absolute and maximum differential prediction. His indices d 
and A are measures of the differential and absolute predictive 
efficiency of tests picked out and utilized by his formulae. These 
indices are new and unfamiliar but the developments due to Horst. 
are of considerable interest to the more advanced student of the 
predictive process. The references quoted are lucid enough 
once the symbolism is understood and include illustrative materia] 
to clarify the processes involved. 

The Horst developments solve the problems of predicting opti- 
mally from a battery a set of multiple criteria and predicting them 
again so as to heighten their differences in the composite scores. 
Thorndike (1949) provides a simple formula for the simpler case 
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of two predictors and two criteria. In this case the validity of 
the differential prediction is given by 


ee (Taare 4a —"ap) + np" ha) 
(3 Ape 


as where, A = score on test predicting criterion A 

@ = criterion value if 4 

B =score in test predicting criterion B 

p = eriterion. value i fp 
As a matter of general prediction strategy, it is to be remembered 
that for any given criterion predictors must have low correlations 
among themselves and high correlations with the criterion (Mehro- 
tra 1950). This and the reverse situation could be graphically 
depicted as in the figures appearing on page 149. 

In Figure ] the variances of tests | and 2 overlap in the shaded 
area and show a high correlation y,,. Their separate unique 
contribution to the coverage of the criterion is given by the areas 
A and B which are comparatively small. Thus in good part their 
validity is being duplicated. ‘The situation is expressible in some 
such terms: 

= 0344, et = - 10 

If we have a third test with no unique coverage of the criterion 
we would not be justified in keeping it in the battery as all its 
correlation with the criterion is a mere repetition of the work al- 
ready done by the other two tests. The calculation of regression 
coefficients automatically provides for the rejection of such repeat 
covariance in tests and reduces their weightage. The position 
in Figure 2 is somewhat like this 


Y10 = 65, Yo = .80, '49 a 30 


In this case both the tests are covering portions of the criterion 
variance uniquely and this will ensure good weights for them. 
In this prediction strategy sometimes a test might carry even a 
negative weight when it acts as a ‘suppressor variable’ (Guilford 
1950) and helps in suppressing a portion of the variance of a test 
which is irrelevant and mixed up with genuinely valid elements. 
The use of suppressor variables is an infrequent contingency but 
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is of interest fro” ; ‘ 
ae m a theoretic point of view, for it emphasizes the 


complex natyfe of the inter-relationship among predictor and cri- 
terion Varables, and the need to mobilize predictors as a team 


rather thin singly for a. desired result. 


Oth Types of Weights 


v It is obvious that regression weights are derived mathematically 

/ and if the original data is in any sense reliable they combine the 
predictcrs arbitrarily for maximum results. If tests are parallels 
of isolated functions in the criterion each being of some impor- 
tance in itself, a blind, mechanical process will not necessarily 
give due consideration to all the facts. In such a situation mini- 
mum cut-off scores are prescribed so that an essential characteristic 
is not entirely missed. In the regression situation the composite 
does not tell us in detail how a person fared in the tests'‘and one can 
easily compensate loss in one test by a gain in another. There 
can be many other rational or mathematical principles of weight- 
ing, €.g., we may weight tests so as to maximize battery reliability 
(Thomson 1940, Mosier 1943, Peel 1947). Or we might employ 
factor analysis methods to determine weights when, using the prin- 
cipal axes type of solution, we could produce weights which will 
at once maximize the variance of the composite scores (Horst 
1936) and minimize the variance of weighted scores for a given 
person (Edgerton and Kolbe 1936). The principal axis solution 
is rather tedious and a faster, approximate procedure using’ the 
first centroid has been recommended by Horst (1936). Sometimes 
we have a number of sub-tests in a battery and wish to combine 
them so as to produce a good estimate of Spearman’s general 
intellective factor. In a hierarchical battery the weights for the 
general factor are proportional to 


where w,=the weight of test 7 
Y,i=the correlation between test ‘i’? and ‘g’, the general factor. 


Further it is possible to weight measures so as to attain any 
desired size of o in the composite; or again so as to equalize the 


STATISTICAL PREDICTION AND DECISION PROCESSES 151 


contribution of each by weighting each inversely to the 7 (Guil- 
ford 1950). Gulliksen discusses weighting techniques at consi- 
derable length in his admirable text on the Theory of Mental Tests 
but his treatment is rather advanced chiefly in respect of the matrix 
symbolism employed. An elementary knowledge of matrix 
algebra will enable one to follow his exposition quite easily. 


Classification 


Differential prediction is a new development which has been 
evolved due to the pressure of practical problems. The urgent 
need is to differentiate between criteria effectively using the same 
multipotential battery of predictors, rather than to deal with each 
separately howsoever successfully. Horst’s (1954) ¢ as an index 
of differential prediction has already been mentioned. This 
aspect of the problem has received the notice of Brogden (1946) 
Tucker (1948) Thorndike (1949) Mollenkopf (1950) and leads to 
the question of optimal classification so that given a group of per- 
sons and a set of jobs we distribute the available talent in such 
a manner that the best results are obtained. The objective of 
individual success is simple from the guidance point of view when, 
as in the scholastic setting, courses are a mere matter of choice 
for the student. When, however, there are limited job vacancies 
and limited applicants the joint problems of differential prediction 
and classification arise for optimum results for the entire organiza- 
tion or unit. These are problems that will draw increased atten- 
tion of research workers in this field. 

The question of numbers of persons applying raises the issue 
of the effect of selection ratio on validity of a battery of predictors. 
When we have a criterion value separating the successful from 
the failures and a predictor critical value which separates the rejec- 
ted from the accepted we have four groups into which the total 
number of applicants can be divided viz. A, the selected successful 
on criterion; B, the rejected who would have succeeded on crite- 
rion; C, the selected who fail on criterion; and D, the rejected 
who would have failed on criterion. Taylor and Russell (1939) 
proposed the consideration of the following ratios: 

A+B 


.; = vo success ratio without selection by tests 
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Sy ae Le pina 9 with 29 29 
ALC ; 
D, —— , the selection ratio. 


When p, begins to approach unity applicants are nearly as many 
as the vacancies and refinements of procedures of selection are 
unnecessary. As this ratio becomes sharper even tests of low vali- 
dity begin to pay off. S must be small to encourage use of selec- 
tion tests. ‘Taylor and Russell (1939) and Guilford and Michael 
(1949) provide nomographs showing the relationships among 5), 
5, and p, for different sizes of validities. Guilford (1950) provides 
a good discussion with illustrative material of problems relating 
.to selection ratio and validity. 


The Curovilinear Relationship 

When a straight line fit is not possible to the bivariate surface 
in a scatter-plot, there is need to consider an alternative kind of 
description. (Eta) is the statistic which represents such rela- 
tionship. ‘The y is the ratio between the oc? of the predicted and 
obtained values of the variable which regresses. So there are 
two 7's or correlation ratios one for each regression. The ‘proce- 
dure of calculation is given in most statistical texts. The use of 
the curvilinear regression in prediction is somewhat rare but where 
a suspicion exists that the data depart significantly from 
rectilinearity, it would be desirable to compute both r and 7 and . 
use a Chi-square test by means of the formula: : 


| 23 

fet 
y? = (N—K)( | 
\ 1-7? 


where A’ is the number of columns or rows. The degrees of free- 
dom used are K—2. If the ? is significant regression is to be regar- 
ded as not conforming to the rectilinear fit. | 

Both factor analysis and statistical prediction are useful tools 
of research in education and psychology and research in these 
fields can be directed firstly to the application of existing techniques 
to practical problems and secondly to the advancement .of theory 
and the development of new techniques of solution; this latter 
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field, would be open to only those who have a sound knowledge 
of advanced mathematics. 


XI Utmiry Metuops 1x STATISTICS 
The Practicable versus the Ideal 


Research in education and psychology is frequently directed 
to solve practical problems. Education is largely an activity, 
and research therein would normally stress the applied aspect 
of knowledge. In psychology even when mental facts are studied 
for their own sake, their discovery is inevitably followed by their 
application to human affairs. In fact the position in almost all 
areas of human knowledge is such that applications far exceed 
the principles or ‘laws’ uncovered by research. Much of the classi- 
cal theory in statistics is concerned with the mere ‘description’ 
(in the Karl Pearson sense) of phenomena in probabilistic terms; 
in other words with problems of estimation of population para- 
meters and related error. Pursuit of such objectives is comparable 
to activity in the field of ‘pure’ science. Statistical models of 
this class are mathematical abstractions and they do not in their 
original form solve practical human problems in concrete situa- 
tions; nor does the reality as we know it conform to their idealized 
models in actual details. Therefore, in actual use the statistical 
concepts acquire more modest forms and objectives. These work- 
a-day statistics enable us to arrive at working hypotheses and tole- 
rably good decisions in the world of human affairs. In this section 
it is intended to introduce to the reader some of these more 
practical types of statistics. 


Power functions and OC Curves 


' In hypothesis testing (Chapter III) reference was made to the 
critical ratio ‘i’ and the levels of significance. The 1 and 5 per 
cent levels refer to the probability of the rejection of a hypothesis 
when it is true. Statistical hypotheses must lead to decisions of 
their acceptance or rejection. Two kinds of mistakes can be made 
in such decisions: (1) Type 1 error with probability « of rejecting 
the hypothesis when it is true. (2) Type II error with prob anne 
£ of accepting the hypothesis when it is false. 
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The customary level of significance with which the reader 
became familiar in Chapter III refers to «. , Normally, therefore, 
we will not reject a true hypothesis more than 1 or 5 percent of 
the time. The hypothesis that is used in the absence of any special 
knowledge is the Null hypothesis, which is symbolized by Hp. 
1 or 5 per cent levels of significance, therefore, make it rather hard 
for the given hypothesis to be rejected and in case of H, we are 
extremely loth to attribute any special contours in the data to 
anything else except chance. This is of course because the scienti- 
fic worker wants to play safe and take the minimum risk of mista- 
king chance contours for systematic ones in his data. In other 
words the worker challenges nature to prove its order (which 1s 
non-chance and systematic) beyond a shadow of doubt. To give 
a concrete example let us take the case of a difference between 
the means of boys’ and girls’ scores on a vocabulary test. If we 
are committed to take no more than | per cent risk of rejecting 
the null or no-difference hypothesis when it is true, we are making 
it too difficult for a small but real difference to prove itself. In 
this way we have run into the arms of Type II error and have 
accepted the null hypothesis 99 per cent of the time when it is 
false. Decision-making requires that both these types of errors 
should be used in the strategy of tests of hypothesis so that in the 
pursuit of a nearly undeniable truth we do _ not ignore 
utilitarian differences of minor order. The power of a test is 


defined as 
lf 


It is the probability of not committing Type II error viz. that of 
accepting a false hypothesis as true. The Power of a test of hypo- 
thesis enables us to reject a false hypothesis. If then a small but 
real difference between boys’ and girls’ score does exist, a power- 
ful test will bring it out rather than have it swamped by permit- 
ting the Null hypothesis to prevail over the alternative. Science 
is interested in finer and finer differentiation and not merely in 
establishing fundamental and major schisms among classes and 
orders of phenomena. Power of a statistical test brings into relief 
such minor differences. In making decisions of practical impor- 
tance both types of risks are to be considered. Power, and opera- 
ting characteristic curves are means to this end. ‘“‘Ideally a and 
Pp are specified’’, say Dixon and Massey (1957) but in actual prac- 
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tice only a of .01 or .05, and N are specified and / is left to fend 
for itself. This can lead to some very uneconomical decisions. 
That is why sometimes « and /# are spoken of as producers’ and 
consumers’ risks which mean rejection of goods as sub-standard 
(rejection of null hypothesis when it is true) and acceptance of 
substandard goods for marketting (acceptance of null hypothesis 
when it is false). Examples of the power and the OC curves 
are given below and on page 106. 


1.00 


I= PROBABILITY OF REJECTING Ho 
wn 
oS 


Mo Mot!On M10, Mot 30m 
POWER FUNCTION FOR HYPOTHESIS M=Mo, & =.05, N=100 


It is obvious that # increases at the cost of a for a fixed .V, and 
for fixed values of « smaller V’s will increase the risk of wrong 
acceptance of hypothesis. The best safeguard against error is 
obviously a generous WV. 

The power and OC curves are useful tools in practical decision- 
making and have a variety of applications in human affairs where 
risks of erroneous acceptance and rejection are both to be consi- 
dered and counter-balanced. They can be put to good use in 
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OC CURVE FOR HYPOTHESIS M=M,, %=.05, N=100 


the treatment of problems dealing with equality of two samples 
or of a sample and a given standard. Dixon and Massey (1957) 
provide a good treatment of such functions. They also offer 
tables which give critical ratios for various « and 1-f, and N’s 
necessary to attain various degrees of power for specified «’s and 
critical ratios. ‘The tests can be one-sided for situations where 
differences are investigated only in one direction or two-sided 
when either of the compared values may exceed the other. OC 
curves can be used with y?, F-ratios, differences between Ms and 


Ps (Mosteller and Bush 1956). 


Sequential Sampling 


In all testing of hypotheses V figures prominently as the single 
constant determiner of the error interval. Random error varies 
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inversely with VY. In this see-saw relationship there is an opti- 
mum balancing point where the smallest possible NV for the purpose 
has just led to rejection or acceptance of the hypothesis. Sequen- 
tial analysis (Dixon and Massey 1957) is a technique for finding 
such optimal Vs. We must in any case state the risks we are will- 
ing to take as probabilities of « and # errors. In sequential ana- 
lysis we reach a decision regarding the hypothesis without increas- 
ing these risks on the basis of a minimum necessary VV. In the 
course of normal procedure a sample of any predetermined size 
is taken and using it the investigator attains a significance of some- 
thing far beyond the required 1 per cent level of confidence. TIE. 
we are willing to regard | per cent and less as good enough signi- 
ficance level, using a generous .V which helps to reduce it to a 
value phenomenally small would appear a waste .of effort. It 
is for this reason that in the situations where it applies and is prac- 
ticable a research worker may be inclined to favour the technique 
of sequential sampling. ; 

In sequential sampling we begin with a minimum WV and go 
on adding observations singly or in small sets of equal size until a 
decision is reached regarding the hypothesis under examination. 
Pon gives the probability of n observations of a kind being made 
up to the attained level of V if H) is true; and p,, gives the other 
probability for similar observations to occur in case an alternative 
ff, hypothesis is true. The larger of the values py, and p,, decide 
the issue in favour of H, or H, respectively. If the difference 
between them is negligible further observations are made. For 
given risks « (rejecting H, when it is true) and £ (accepting H, 
when H, is true) 


p 

= & leads to acceptance of H, 
Pon 1—a 

p 1—f 

"> pee leads to acceptance of H, 
Pon % 


Pin l i : 
~ <— < ~~~ leads to further observation. 
l—a& Pon “4 
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For an example, let the proportion p, and p, of one kind of article, 
say defectively marked paper, be under H, and H, 


Hi = p= (0b 
a fie a 
Further « = .01 (Probability of rejection of Hy) when p=.05) 
: 7 f = .05 (Probability of acceptance of Hy when p=.20) 


Then fo, = pe (l—fo)* where W and R stand for number 
and pi, = PY(l—p,)* of times papers 2 are wrongly 


and rightly marked, W+R= n 


If we have obtained n—8 observations of which W=3 and R=5 
and the order of occurrence is RWRRWWRR, then 


. Pon = (1—Po) bo(1—Po) 51 —Po)” 
= pal —py)° 


i iD : : 
The ratio fie changing as observations are added to the 
On ; 
lf 


sample and ranges between the limits and ———— .In the 


a 


present example these limits are .05 and 95. 


Pin 


The ratio — 
On 


can be summarily derived by being written as 


-(S) Ca 


The powers W and R, i.e. number of papers that are wrongly 
and rightly marked are inconvenient to apply and can be con- 
verted into natural logarithms. We can rewrite the above as 


Res 
= W log + R log 1 ‘ 
0 aa 


0 


Pin 
g ii 


lo 
On 


Putting the second member of this equation equal to log 95 and 
log .05 at any point n, two equations can be obtained with W 
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and Rk. Substituting values of one they can be solved for the 
other and a graph prepared as shown below: 


20 
16 
ACCEPT Hy; 

12 
WwW 
Zon 8 Continue sampling, 

4 

ACCEPT Ho 
20 30 40. 50 60 


SEQUENTIAL SAMPLING GRAPH . 


Each additional case is a movement of a line up or down accor ding 
to whether the paper is Wrong or Right and when the lines are 
crossed the indicated decisions are taken. If neither is crossed 
further observations are made. 

Sequential sampling would obviously enable us to take decisions 
on smaller and just sufficient Vs under the specified risks. It 
has found some very ingenious applications in the field of psycho- 
metrics. Walker and Cohen (1949) propose its use for selecting 
test items on the basis of their satisfactoriness or Gtieatiefactoriness 
as judged by some criterion, sampling being from the two ends 
of the group. Similarly Kimball (1950) set up formulae for check- 
ing answer sheets for scoring defect. For any given number of 
test papers re-examined, the decision is automatically made re- 
garding rejection of the lot as badly marked, acceptance of the 
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lot as meeting the specified standard of satisfactoriness and the 
need to continue sampling (Varma 1956, 1960). In fact sequential 
sampling can have innumerable applications to problems in edu- 
cation and psychology. Its chief advantage is economy of WN 
to be sampled provided each fresh case or set of cases is readily 
available and does not involve the researcher in the extravagance 
of repeat experimentation. | 


Non-parametric Statistics 

It has been pointed out that classical statistics is concerned 
with the estimation of population parameters and the related 
theory of error. Allsuch statistics assume some kind of a distri- 
bution in the population. This assumption is the basis of their 
rationale and development. ‘The statistical tests they offer are 
rigorous and more efficient even if they need Vs of considerable 
size. Measurement in psychology and education does not. inva- 
riably give us interval and ratio types of scales, on which para- 
metric statistics are based. For conditions, therefore, under 
which the Ws are not considerable, distributions of the variable 
in the populations are not known and the ncminal and ordinal 
class of measures only are available, non-parametric statistics 
beccme applicable. They are not merely distribution-free but 
are also much easier to apply. Unfortunately the tables and 
nomographs necessary for their interpretation are not as easily 
available as those for the parametric tests. ‘The research worker 
in psychology and education would, even so, frequently come 
upon data which justifies their use and a familiarity with this 
type of approach to problems of individual differences will be 
found. rewarding in the long run. The assumptions underlying 
their use are so few and unrestrictive that they can be applied 
as a first preliminary probe to any kind of data. The y* and other 
enumeration statistics are examples of this yeoman methodology 
which mobilizes for its purposes such nominal and ordinal statistics 
as are associated with mode, median, percentile, contingency 
coefficient, mere frequency, and the rank correlation. We shall 
consider here two examples of non-parametric tests to indicate 
their essential nature. 

The Wald-Wolfowitz run test: We take a sample V of boys 
whose general behaviour in school is normal i.e. not remarkable 
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for any oddity or peculiarity of conduct, and generally satisfac- 
tory. Subject any number m of these to special conditioning, 
Say, frustration or some kind of privation or special treatment. 
If we secure for all the WV boys their scores on an adjustment inven- 
tory or other objective assessment of their tendency to misde- 
meanour we can order them according to the degree of their adjust- 
ment. The succession of members of the conditioned group and 
the control group without break is called a ‘run’ in the order pro- 
duced by the objective assessment or the inventory. If only 
one Case occurs to break a line of succession it counts as a run in 
its own right, e.g. in the order below a’s represent the conditioned 
and b’s the control group membership: 


aababbbaabaaaababbaaabbab 


mm n 
Here Ya=14, & b6=11, and N=25 andr=14. The total number 
of runs of a and # in this order are 14. 
For 5 per cent level of significance tables are available which 
specify the limits of r at different values of n and m, m being less 
than or equal to n. These tables give the following limits, 


18>r>8 


Since 14 is less than 18 and greater than 8 we may conclude at 
2 per cent level of confidence that the conditioning has not pro- 
duced any marked change in the adjustment of the boys. Many 
other kinds of situation in school and the psychology laboratory 
will admit this kind of solution. 

The Corner test for Association: Supposing we are interested 
in finding out if two variables are correlated, and not knowing 
their population distribution require a snappy preliminary index 
on the basis of a few cases only. We would then, following 
non-parametric methodology, prepare a_ scatter plot of cases 
upon the intersecting medians of the two variables as shown 
on page 162. 

It is obvious that the cases in the quadrants marked (+) produce 
good positive correlation; those in quadrants marked (—) produces 
good negative correlation and cases in both types of quadrants 
cancel out these trends mutually. In the corner test begin a 
count of points (representing each case) from extreme right until 
a point is met which lies in the adjacent quadrant; e.g. there are 
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3 points in the upper right hand quadrant but the 4th crosses over 
to the lower right hand quadrant. Points in a continuous run 
lying only in one quadrant are thus counted from left, top and 
bottom. If two points lie on the straight line in two quadrants 
then the value of the one being counted is halved or further frac- 
tioned.’ In the present case the count from the four directions 


of such points gives 


Right 3 
teat. 3 
10p. - 2.9 
Bottom 2 


Total 11.5 


ee ee oe 


, 
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Note that one point is counted twice and another is half on account 
of the tie. There is available a tdble (Dixon and Massey iga7 
which gives for various sizes of 2V and S, values of «, the probability 
that the quadrant sum equals or exceeds the given values of S. 
In the present case the probability « for § =11 and 2N>14 is .034 
which is significant at .05 per cent level. We would, therefore, con- 
clude that the variables are not independent. This is like the x? 
test of independence being only quicker in the arithmetic required. 
The methodology of distribution-free statistics is receiving increas- 
ing attention and a large number of tables for interpretation of 
critical indices are now available. Some of these tests will abridge 
labour considerably without sacrificing technical elegance and the 
research worker interested in them is referred to Sidney Siegel’s 
clear text Von-parametric Statistics (1956), Mosteller and Bush (1956) 
and Dixon and Massey (1957) who provide both illuminating 
treatment and necessary tables. Siegel treats the one sample, ~ 
two related and independent samples and K-sample cases as also 
non-parametric measures of correlation such as Spearman p and 
Kendall’s 7 and W (Kendall 1948) for all types of measures includ- 


_ing the two interval types also. It must be noted however in 


concluding this topic that non-parametric statistics are statistics- 
in-a-hurry and if the usual parametric methods apply there is 
no reason why they should not be used unless the desire is to secure 
a preliminary snap-judgement of the general nature of the data 
before proceeding to apply the more efficient formal methods of 
the parametric class. 
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CHAPTER Vi 


THE ROLE OF THE EXPERIMENTAL 
AND CLINICAL METHODS IN 
RESEARCH 


A. THE EXPERIMENTAL METHOD 


Ir has been pointed out earlier that ‘clinical and experimental 
methods are distinguished from each other and from the statistical 
methods by differences in their centres of interest, objectives and 
end-results. Strictly speaking, clinical methods are not methods 
of research at all, for the truth they seek is of a particular rather 
than the general class. It is concrete and time-bound and ana- 
logous to the truth of history. In the form of the case study, 
however, even the clinical method tries to add to the quantum 
of generalized human knowledge, and is of some value as a form 
of approach to certain human problems. We shall revert to an 
appraisal of it later. 

The experimental method is intended directly to add increments 
to human knowledge. The association of the word experiment 
with the physical sciences bestows on it a laboratory sanctity and 
has tended to limit its significance for the uninstructed. We shall 
consider the leading characteristics of this method since it sets 
the pattern for most experiments conducted in psychological 
laboratories. It has been stated that science describes the pheno- 
mena of its domain through measurement. In physical sciences 
both extensive and intensive type of scales apply. Science has 
developed by now several ‘absolute measures’, such as the calory, 
the ampere or in psychology the I.Q.., which claim an international 
meaning. In any given experiment, however, the worker may 
be interested in comparisons only, when any convenient measure 
will do. It is obvious that psychology and education possess very 
few absolute measures which may be referred to international 
standards. | 

The experiment of the physical laboratory type begins with a 
specific enquiry for which reason the experiment is rightly described 
as ‘a question put to nature’, the answer to which may be yes, 
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no or an evasion. A good experimental set-up will minimize 
the scope for evasion. The enquiry may be in the form of an 
explicit question or it may be couched in the form of a hypothesis; 
e.g. the question may be asked “‘What change occurs in the coldness 
of ice when salt is added to it?” or we may hypothesize that 
ice becomes colder with the addition of salt, and test the correctness 
of the hypothesis. The hypothesis limits the enquiry and cuts 
down the alternative outcomes. There are different orders of 
facts for different domains of science and the enquiry has relevance 
to this order and none other and seeks to discover the order of 
nature in that domain. The simple question ‘“‘why did the cat 
Sammy die?” can be answered from the point of view of the 
physiologist, the law man in case it was murdered or the municipal 
councillor in case it was destroyed as a part of a drive against cats. 
These orders of facts which interpenetrate and overlap in a single 
occurrence have each a relevance to a domain and the hypothesis 
has to avoid being general to forestall confusion in response. 


(1) The Psychological Experiment* 


Formidable apparatus is very characteristic of experiments in 
science. ‘The apparatus is designed to reduce errors of observation, 
systematic and random, to a minimum and to maximally control 
the conditions under which the experiment is conducted. The 
laboratory is equipped with a view to facilitate a maximum control 
of conditions which again reduces random error. If the simple: 
description of an object and its behaviour is the objective the 
experiment consists of measurement treatments and observations 
(with the help of instruments) which are recorded. We might merely 
be studying the properties of light. The kind of set-up necessary 
for this is different from that in which effects of variables are 
investigated. In the latter an object is not being defined by 
exact attributes but a relation is in issue. Under such circumstances 
a control is set up against an experimental object. Some property 
is observed and then differential treatment is applied and changes 
observed; e.g. the differences due to the addition of salt and sand 
to ice may be examined against the temperature of plain ice, 


1 The classical psychophysics discussed in Chapter III ideally exemplify the 
methods of psychological experimentation and contextually belong here. 
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which acts as the control. In such experiments the independent 
variable is the factor we are free to change; here, the thing that 
is added to ice. The dependent variable is the effect viz. the 
temperature. 

When an experiment such as the above is carried out the observa- 
tions are carefully recorded. These, if they are not mere qualitative 
description need some further analysis. The object of the analysis 
is to condense and marshal observations in such a manner that 
the characteristics are made to stand out or the hypothesis is 
clearly rejected or accepted by the result. The experiment ends 
by our drawing a conclusion about the phenomena in our particular 
order of facts. In the physical sciences there are sensitivity 
experiments which establish critical values in variables for certain 
significant changes in them or results they produce; e.g. what maxi- 
mum pressure will a metal construction of given specifications and 
_ alloy bear without risk of failures? In psychology the psychophysical 
methods investigating conditions of subjective equality and threshold 
(described in Chapter 3) are like the senstivity experiments of 
science. | 

The laboratory paraphernalia and set-up are met with in psycho- 
logical experiments also but there is a radical difference between 
the domains of the physical sciences like physics and chemistry 
and of the biological sciences. The object of study in the former 
is governed by mechanical principles and is characterized by a 
uniformity of behaviour. The object of study in biological science 
has only a probable uniformity of behaviour and shows variability 
in reaction in the same object at different times or from object to 
object. That is why in physical science the light or heat can be 
studied for its ‘laws’ and except for errors of observation the laws 
will hold. The study of a single case therefore has some chance 
of being revelatory of a whole class of phenomena. As against 
this in biological science the fact of intra-person and inter-person 
variability creates problems of norms and trends the solution of which 
requires statistics. The universe of physics has ended by being 
probabilistic in its transactions; but for practical purposes its 
behaviour is as good as wholly determinstic. Experiments in 
psychology and education therefore use statistics compulsorily 
as otherwise generalizations in them simply cannot be made. 
Minds addicted to the methodology of the physical sciences 
are chary of non-objective, externally unobservable order 
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of facts such as the mentality of man and the animals. They 
insist on externally observable, objective data and are responsible 
for the school of thought known as Behaviourism. For this 
reason, in psychology the methods are divisible into objective and 
subjective. It would appear that the refusal to consider subjective 
data and insistence on expression of it in overt organic behaviour 
(such as salivation of Pavlov’s dogs or an outburst of grief), is 
tantamount to a redefinition of psycholcgy which by its name 
suggests the inwardness of mental experience. Good psychologists 
therefore do not discard subjective experience as an inadmissible 
class of facts but only require that it is not assessed and interpreted 
by the experimenter subjectively, which is quite another matter. 
A journal like the Psychometrika is therefore ‘“‘devoted to the 
development of psychology as a quantitative, rational science’. 
The approaches to human mentality have been considered inter- 
estingly by Stephenson (1956) who proposes the following classi- 
fication: 

For person X observed by person Y we have the possibility of 
the following categories of data (Stephenson 1956) : 


(a) ‘Inner’ (1) ‘Psychism’ of X 
(2) Introspection of X 
(3) Reconstruction of 2 by & 


(b) ‘Outer’ (4) Observation of X by Y 
| 3 (5) Self-observation of own behaviour 
by Xx 
(c) ‘Historical’ (6) Through 2 and 5 


(7) Through observation of X by Y over 
a period of time. 

According to Stephenson narrow introspectionism is concerned 
with (a)(1); the behaviourists favour (b)(4); psycho-analysts are 
confined to (c)(6); the Gestaltists regard (a)(2) as the main source 
of data. Experimental psychology uses sources (a)(1), (a)(2) 
and (b)(5) for its method of zmpression, and sources (a)(3) and 
(b)(4) for its method of expression. 

An experiment in psychology requires notes on day and hour 
of experiment, name of the subject, his general condition of body 
and mind, the statement of the problem to be studied, brief descrip- 
tion of apparatus and material, the method of the experiment 
with the data and its treatment and finally the result leading to 
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a conclusion (Collins and Drever 1949). In most experiments 
carried out with elaborate apparatus the data need only elementary 
statistical manipulation, the emphasis here being on elimination 


of error before observations are recorded and thorough control 
of conditions. 


(II) The “Field” Experiment 


Experiments in biology and social sciences like education and 
psychology have frequently to deal with data which cannot be 
brought into the laboratory. Such are the many ‘field’ experi- 
ments in which the data is appraised in the field which is its natural 
setting. Not only is it not possible to bring it into the laboratory 
but it may also be that such transplanting might alter its nature 
on account of the artificiality of the laboratory situation. The 
‘field’ setting offers many more ‘cases’ as subjects to compensate 
for the lack of control on conditions and less precise methods of 
stimulation and response. Statistical methodology must take 
care of these on the basis of the augmented JV. In fact the labo- 
ratory usually multiplies its observations by repeat trials on the 
same person. : For this reason, in agriculture, medicine and social 
sciences there is need for statistical designing of experiments. 
The ‘field’ investigation differs from the survey in respect largely 
of numbers and further relaxation of conditions and simpler 
methods of stimulation and response. 

The ‘field’ experiment is characteristic of research processes 
in the social sciences, and a number of logical categories of 
these have been offered. Greenwood (1945) mentions the pure, 
the uncontrolled, the trial-and-error and the ex post facto types of 
experiment, and the controlled observational study. These 
terms are self-explanatory except for the ex post facto form of experi- 
ment. In this, present observations are traced backwards through 
such recorded facts as may be available e.g. the residents of a 
Borstal institute may be studied in this manner, when the history 
of each case may be traced backwards to the first crime and its 
antecedents. Chapin (1947) divides the ‘field’ study into three 
major types: the cross-sectional method which on a given date 
makes and records observations to be analysed; the ‘before-and- 
after’ study which implies criterion observations being made 
before and after a ‘treatment’ so that the effect of the treatment 
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may be evaluated; finally the ex post facto type already mentioned 
which does not allow for any treatment to be given at will, although 
it does commit one to the time perspective. 


(III) Miull’s “Canons of Experimental Enquiry” - 


Obviously the ‘field’ experiment is not the same thing as the 
laboratory experiment and yet both are governed by the same 
logic. John Stuart Mill (1943) undertook to expound the basic 
methods of experimental enquiry which seek according to him 
to discover ‘causal connections’ and to demonstrate the validity 
of the inductive argument. His “Canons of Discovery” are of 
interest as furnishing the logic of experimental designs. Mill 
postulates five such ‘“‘Canons”, which are described briefly below: 

(1) Miull’s method of agreement lays down the rule that, if two cases 
possess a common characteristic and on examination are found 
to be totally dissimilar except for one fact, there is some rela- 
tionship between the observed fact and the common characteristic. 
The difficulty here obviously is to make sure of the complete dissi- 
milarity. If two different species of animals are accidentally 
sealed off from air and die, we can argue by this canon a causal 
link between death (the fact or phenomenon) and the common 
characteristic, respiration. 

(2) ‘The second method is known as the method of difference and 
states that if two cases differ in respect of some fact, say the break 
out of a rash and are found to be similar in all respects except 
one single characteristic or condition e.g. one of two siblings has 
eaten contaminated food, we conclude that the condition is related 
to the fact in which alone the cases differed. Here again the 
condition ‘similar in all respects’ is a tall order and unverifiable. 
This is the familiar experimental design of science which equalizes 
the conditions by rigorous control and varies one factor for the 
experimental case.. For example, two guinea pigs of identical 
_ stock and physical condition are kept under the same conditions 
but are fed differently. Any increase in size or weight is then 
attributable to the factor of feed. Exactly similar conditions and 
exact similarity are difficult to provide in the human situation. 

(3) Mills third method uses both these principles of agreement 
and difference together. Consider, for an illustration, two guinea 
pigs which are dead and had only the single fact in common of 
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having eaten a brand of feed; and two other guinea pigs who 
survive and have- nothing in common except the fact that 
they were not given the particular brand of feed. The conclu- 
sion obviously is that the feed is fatal for some reason to guinea 
pigs of all kinds. 

(4) The method of concomitant variation is the familar correlation 
in original habiliments. This merely affirms a link between 
quantities which vary together directly or inversely. 

(5) The method of residues states that if we eliminate all effects 
due to known causes the remaining effects can be attributed to 
residual antecedents. This implies a previous knowledge of the 
majority of causal connections. Obviously this is a condition 
which may not always or easily be satisfied. ! 

Mill’s ‘canons’ set up the essential logic of experimental designs 
which have with the passage of time and advance of statistics 
proliferated into elegant procedures requiring a sound knowledge 
of advanced mathematics in the user. The currently available 
experimental designs follow Mill’s fundamentals in trying to 
control and manipulate factors and watch not merely their effects 
in isolation but jointly as in various orders of interaction 
variances. Formation of groups and their subjection to various 
‘treatments’ provide the frame within which many kinds of designs 
are constructed. Analysis of variance has been found to be the 
most powerful tool for operating such designs for the study of 
differences among groups and therefore most designs of experi- 
ments now employ this approach, through varied models, to 
experimental work, specially in the field of biology and agri- 
culture. The subject is treated in all its complexity and richness 
in advanced texts like those of Fisher (1951) and Cox and Cochran 
(1950). Analysis of variance and its basic rationale have already 
been considered in their simpler aspect in Chapter III. A fuller 
but not too advanced treatment of experimental designs in analysis 
of variance is available in Lindquist’s Design and Analysis of 
Experiments (1956), to which the reader is referred for details and 
examples. : = 

Mill’s canons accept the basic inductive pattern of all scientific 
enquiry and the impression is widely prevalent that scientific 
knowledge grows by a process of generalization from many parti- 
cular cases. In contrast to this, modern experimental enquiry 
employs what has been called ‘ihe hypothetico-deductive? pattern for 
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its search of the truth underlying observable phenomena. If the 
stage of development of a science is such that research is primarily 
directed to a detailed survey of the domain for the purpose of 
exact description, knowledge may accrue by the simple inductive 
principle. But when this initial spade work is done research 
advances to the task of uncovering the principles underlying the 
complexities of the phenomena. Mere repetition or multiplication 
of observations will not be meaningful here. What is required 
is that a theory or a tentative explanation be fitted to the complex 
relationships of facts being studied. A hypothesis is therefore 
set up on the basis of existing knowledge and insight into observed 
facts. This for the duration of the experiment acts as a general 
proposition which in the argument subsumes the particular experi- 
ment as a case in point and so applies to it deductively. If the 
hypothesis is proven then it matures into a thesis; if it is over- 
thrown, then the tentative explanation is discarded as inadequate 
Since it does not cover as a proposed general principle the parti- 
cular case of the experiment. This approach to experimental 
designing is known as the hypothetico-deductive method. 


(IV) Experimental Designs 


The basic requirements of the experimental design as it is 
applicable to the laboratory or the field are that there should be 
some factors, which may be called ‘treatments’, the effect of which 
is to be studied; and there should be groups formed on the basis 
of those factors by a random selection of members from a popula- 
tion for the different treatments; finally there should be a criterion 
measure to evaluate the ‘effects’ of the ‘treatments’. The factors 
of group formation need not be actual ‘treatments’ ; any character- 
istic can forma group e.g. factors of educational level or profession 
or class in school may be used as ‘treatments’, the word having 
been derived from agricultural experiments. In designing experi- 
ments of this kind, which do not manipulate one factor at a time 
in an experimental group holding conditions constant for the 
control, it is necessary to provide for: 


(1) Bias free estimation of true ‘effects’. 


(2) Precision of the estimates with a quantitative index of it. 
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(3) The testing of clear specific hypotheses of differences, inter- 
action etc. 


(4) ‘efficiency? in the sense of securing maximum results at 
minimum cost. 


Lindquist (1956) mentions six typical designs which furnish 
the pattern of such experimentation : 


(1) Stmple randomized Design: 


This is the kind of one way analysis which was described under 
Analysis of Variance in Chapter ITT. 


(2) Treatments x Levels Design : 


For a further refinement, members drawn randomly from several 
levels of a relevant variable are included in each group. This 
is like introducing a factor of lateral classification on a continuous — 
or graded variable. This kind of stratification by a related 
variable within each factor column tends to equate the groups 
somewhat in a significant attribute. Thisrules out the disturbing - 
effect of the attribute and increases precision. | Matched 
groups is an extreme form of this equating process because the 
two groups are formed there from members of equated pairs. 


(3) Treatment <x Subjects Design : 


This method carries the equating to the point of identity, 
so that the same persons are subjected to several treatments 
turn by turn. In this manner inter-subject differences are 
completely ruled out. Unfortunately the requirement that the 
same persons should be repeatedly subjected to several ‘treatments’ 
restricts the scope of the method severely. Obviously the precision 
is better than ever before provided the experimental procedure — 
has been safeguarded against the effect of repetition of ‘treatments’. 


(4) Random Replication Design : 


This method provides for several replications of the same simple, 
randomized design’ so that differences between _ replications 
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can also be considered. The errors due to sampling of groups 
and repeat experimentation are caught out in this type of 
more expensive analysis. 


(5) The Factorial Design : 


This has already been considered under Analysis of Variance 
in Chapter III, as two-way and higher order analysis leading to 
double, triple and higher order interaction variances. This is 
possibly the most useful and elegant design, if its conditions 
are met in the data. 


(6) The Groups-within-Treatment Designs : 


If the population is available in many intact groups, cluster- 
sampling (see Chapter III) of such complete groups under 
each ‘treatment’ gives this kind of set-up. In Lindquist’s example 
classes drawn from various schools are taught! by different methods. 
If there are two methods, I and II, and schools are from A to 
ff and the class 8, then under this design the following type of 
allocation will be possible : 


I 8 4D I A 
it 8 mG FAy 


The cluster sampling by class-in-school makes this kind of design 
rather imprecise. 

These are a few of the kinds of designs research employs in the 
biological field and they can be variously combined. When the 
factors or ‘treatments’ are several the complete factorial and 
other designs which require every possible combination of factorial 
categories become prohibitive; e.g. with three factors and 4, 3 
and 3 categories in each the total number of combinations are 


4x 3.x 3'= 36 


Several designs have been developed to abridge the experimental 
task by a reduction of combinations of factorial categories. In- 
complete blocks, Latin square, Greco-latin square and Lattice 
are some of the more ingenious and better-known of such designs. 
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Mann (1949), Lindquist (1956) and Goulden (1959) give good 
accounts of them which should be consulted by those interested. 
If effects of several factors with sub-categories are to be investigated 
then as a measure of economy the possibilities of the less costly 
and quicker designs should be explored. There may be some 
difficulty in sorting out the details of the design but once the 
initial arrangement is understood the rest of the calculation 
is easy. 


B. THE CLINICAL APPROACH IN RESEARCH 
(I) The Nature of Clinical Work 


It has been pointed out that the objective in clinical work is a 
thorough-going examination of a particular case or set of cases 
with a view to their understanding for ameliorative and remedial 
action. Research as meaning direct increment to knowledge by 
discovery of principles of general validity is not the immediate 
concern of the clinician. He has an implicit interest in the 
identification of the typical picture of deviants of various kinds 
even if explicitly he is concerned with the particular case alone. 
Clinical research is therefore a secondary activity of a special 
kind which adds to our knowledge of non-normal phenomena. 

For one thing the clinician proceeds from the symptom to the 
cause, from what Cattell (1946) calls surface trait to source trait. 
This preoccupation with ‘depth’ in preference to surface contour 
is a welcome corrective to the more radical behaviourist trend in 
present-day psychology. It is wholly desirable that a quantum 
of sound research activity should be directed to the unmanifest 
dynamisms of the human personality. 

In his analysis of cases the desire of the clinical worker is to 
unravel the complexity of the subject’s. condition and pin it down 
in one of three major areas of malfunction. The abnormality 
may spring, in the first place, from organic defect so that the defect 
is mainly of the nature of a physical handicap. Secondly he 
has to decide to what extent the failure is purely functional, which 
means that there is nothing in the organic condition to prevent 
the person from reaching a normal state. Slow maturation and 
lack of training in the use of certain functions would then explain 
the failure. Finally it may be that the trouble is psychogenic 
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in nature, in which case the obstruction is to be found in the deeper 
layers of the mind. 

To be able to analyse the case thoroughly the clinician will need 
to look at his subject in the time-perspective and prepare an 
exhaustive case-history. In cases of mental abnormality all too 
frequently the precipitating causes alone are to be found in the 
present set of circumstances; the roots of trouble are embedded 
deep in the past which must be explored thoroughly. The compila- 
tion of case-history for any given area of ameliorative activity 
is a highly technical job. Burt Suggests a scheme which gives a 
retrospective survey of the past, a conspectus of the present condi- 
tions and state, and the prospect for the future. For educational 
and psychological problems like backwardness or delinquency, it 
will answer the purpose very well and has been considered in 
detail later. 

The clinical worker is as interested in the case as in his or her 
immediate mental and physical environment. Gestalt psychology 
has stressed the role of the ‘field? in human behaviour. The 
‘field’ of the case i.e. its mental and physical environment is the 
matrix which produces the abnormality. It affords not merely 
the clue to the precipitating factors but the perdurable background 
of the personality to which the person must be readjusted. In 
clinical work therefore the attention to the environment and the 
complex ‘field’ from which the case arises is systematically and 
searchingly directed. Details of procedures of assessing these 
are given in technical literature relating to counselling and clinical 
work. It may be noted that this is not an impersonal, sociological 
type of survey of the subject’s personal background. It is essentiaily 
an appraisal of his vital and dynamic relationships to the persons 
and things in his background. 

All this the clinical worker does to achieve a global picture of 
the personality he is examining. He maps several areas but in 
the end uses the fragments to piece together and reconstruct a 
living person in distress of some kind. The global view is not a 
mere mechanical mass of facts which give only knowledge. It is 
knowledge with understanding which requires insight, intuition 
and sympathy. The clinician uses the instruments of the psycho- 
metrician to complete his picture of the case in exact detail. One 
of the advantages of using standard tests is that the case on them 
is referable to sound norms so that the degree of deviation on 
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each may be noted. It is in fact in these deviations that his 
abnormality in the clinical picture will be brought out. Witness 
for example the use of the Wechsler-Bellevue test in clinical 
diagnosis. | 

Although the clinician obtains a detailed record on a host of 
relevant variables he is not using each separately as an aspect 
of the subject’s deviation from the norm. He is interested not 
merely in single variables which are germane to his problem but 
more so in their conjoint operation. A subnormality in one trait 
may produce or result from a deviation on some other trait. The 
trait picture is therefore a complex one linked underneath by 
various types of multilateral relationships. It is therefore much 
to his advantage to lay bare these hidden links among artificially 
segregated measures. In fact this is one of the means to achieve 
an integrated global picture of the case. : 

This psychometric approach made in the light of full 
information regarding the ‘field’ or background, enables him to 
jsolate the causes of the malfunctioning or maladjustment by 
a process of climination of causative factors, one by one. He 
attends to each trait with concentration and observes the effect 
and in this manner is able to mark out to some extent those 
that show functional relationship with the condition to be 
alleviated. 

None of the elaborate activities so far enumerated take the 
clinical worker anywhere near the straightforward type of research 
which looks for general propositions in collected data. It is true 
that since his primary concern is the case, he can use his knowledge 
of a particular one only as a secondary means to research. The 
‘paper’ that a clinicain presents is often the detailed report on a 
case of unusual interest. The ‘paper’ illustrates the method used 
by clinical workers to ‘explain’ the case and justifies the saying that 
the exception proves the rule. This ‘explanation’ may amount 
to the statement of a new principle regarding the working of the 
human mind. It is likely to be of immense value to professional 
colleagues. The case may in fact be accounted for in terms of 
existing concepts in spite of its peculiarity or irregularity. For 
this reason Hall (1955) maintains that Freud offered general 
rules of psychology which by their interplay could produce neurosis 
and psychosis also. Obviously such a ‘paper’ can be a contribution 
to research. 
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(II) The Case Study Method 


The case-study method of research in education and psychology 
subjects a limited number of cases to statistical analysis, on the 
basis of the small sample theory. The cases may be gleaned 
from clinical records or compiled in the course of clinical examina- 
tions. In a class of material like this one cannot sample to order 
and the analysis is perforce of the ex post facto kind. The small 
Ns would inflate sampling error but this tendency is counter- 
balanced by the larger differences between the control group 
of normals and the clinical group of deviants. Therefore, the 
probability of inferences being correct, maintains its level. The 
use of statistics in medicine has improved the small sample techni- 
que greatly and they can be used quite effectively on clinical 
material of any kind. 

Clinical research of the kind briefly described above leads to 
definitive and tangible results which illumine dark corners of 
human life and have immediate practical application. Two 
kinds of results of such research may be mentioned as examples 
of achievement in this area. F requency of cases showing common 
surface characteristics leads to the identification and naming 
of a new ‘type’ comparable to a syndrome in medicine. This 
identification gradually leads to an analysis of the etiology and 
related, remedial and preventive measures. The repeated pro- 
bing underneath the surface familiarizes the clinician with the 
obscure entities in ‘the Sheol of the unconscious’ and leads him 
to put forward hypotheses about mental dynamics and mechanisms 
and put them to test in the course of clinical work. In fact he 
may find enough material in this exploratory and tentative manner 
to formulate a regular theory regarding mental functions. The 
stupendous work of Freud and his followers is a proof of this possi- 
bility. W. C. Olson (1939) considers some of the aspects in which 
the case-study makes contributions to generalized knowledge 
in a field. 

Probably the chief difficulty of the clinical worker as a servant 
ef research is his peculiar personality which has to be capable 
of good rapport and intense preoccupation with the particular. 
To that extent he is deprived of objectivity and the steady pursuit 
of the general truth which holds for all. time regardless of 
€xceptional cases. He has to achieve the paradox of promoting 
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and improving the law by studying mainly those who are 
beyond it. 

The case-study is based on case-history and both are used in 
case-work which includes ameliorative procedures also. Good 
and Scates (1954) give considerable space to a consideration of 
this method, as a method sui generis and not necessarily as a method 
of research. Case study is a delicate job and requires in the worker 
not only a lot of experience but also a fund of natural flair and 
aptitude. It involves one in an essentially human and sensitive 
relationship with a total stranger. To be able to establish good 
rapport quickly it is necessary to enter deeply and fully into the 
entire background of the personal life of the subject. This requires 
a case-history to be prepared as a preliminary to case-study. ‘The 
technical name for collection of such information is anamnesis. 
Hadley (1958) breaks up such record into educational, family, 
developmental, sociological and occupational histories. Shaffer 
and Lazarus (1952) provide a slightly different kind of organiza- 
tion of material which would serve the same purpose. 

Burt’s scheme for case study provides for collection of informa- 
tion under the following heads: 

(A) Past history, personal: conditions of pregnancy and 
birth; landmarks of physical and mental growth—dates of talking. 
walking, dentition, school, puberty; other notable events—acci- 
dents, illnesses, operations; family life; pre-natal investigation intc 
ancestry and post-natal history. Sources of data: patient, parents. 
associates, family members, teacher, doctor. 

(B) Present conditions: (1) Patient’s environment—concerE 
not with income, rent, number of rooms but with mental facts. 

(2) Patient himself—(a) Physical condition—adenoids, worms 
general health, bodily ailment, disease or disturbance of the ner 
vous and glandular systems having a relationship to mental state 
(6) Mental condition (1) Intellectual aspect (1) Innate capacities 
general intelligence and special abilities. 

(ii) Acquired attainments: education, special skill or knowledge 
general information and culture. 

- (2) Temperamental aspect (i) Innate elements: specific ten 
dencies; general emotionality (ii) Acquired characteristics: Com 
plexes, neuroses, interests, attitudes or sentiments, character. 

‘These categories: are too logical and compartmental to be ver 
meaningful except as a frame-work of objectives to be achievec 
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Good and Scates (1954) suggest the following heads for collection 
of facts on a case. 

(1) Examination 

(4) Psychophysical: vision, hearing, speech, neuro-muscular 
coordination. | 

(B) Health: height-weight ratio, nutrition, teeth, general 
condition. 

(C’) Educational: reading, number, handwriting, composition, 
apperceptive mass, capacity for sustained application, attitude. 

(D) Mentality: general intelligence through verbal, non- 
verbal tests. 

(2) Health history. 

(3) School history (A) Promotions (B) kind of work done 
(C’) Mobility (D) quality of schools attended (Z) Relations with 
teachers. 

(4) Family history and home conditions: ancestry, parents, 
siblings; economic status and history; cultural resources; relations 
within home; attitude of parents towards society; control. 

(5) Social history and contacts: attitude towards own religion; 
mixing with children of the same age; undesirable company; 
sex history; delinquency. 

So long as a comprehensive and complete picture is available 
in the end, it does not matter under what categories the data on 
the case is organized. The interview is the method par excellence 
of obtaining data on a case in clinical work. As a means of 
securing data it has been considered in detail in the next section. 


C. METHODS OF COLLECTION OF DATA 


It was pointed out that in the eruditional type of research such 
as we meet with in the areas of history and philosophy, books 
and historical material are the main sources of the data. In the 
scientific kind of research whether of the statistical, experimental 
or clinical kind the data consists chiefly of accurate information 
of some specific nature about the subject of enquiry. The sources 
are frequently recorded responses under various kinds of controlled 
stimulation, and other records or material which give us a clue 
to the things we are interested in. We include in such sources 
the objective test which yields a quantitative score on the one 
hand, and personal records like diaries and letters or even reported 
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dreams. The differential of this type of data is that it is vitalistic 
and gives us information about concrete living persons who repre- 
sent a class of phenomena. It is neither revelatory of the past 
like the historical material nor is it representative of the systematic 
thinking of a great mind, as is the case in the philosophical 
type of research. Such data is available by the following 
means: 


(I) The Content Analysis 


This term is used for the analysis into formal and logical cate- 
gories of documentary material of all kinds. Vocabulary in text 
books furnishes a good example of this kind of classification by fre- 
quencies. The term is, however, of a general import and any 
kind of documents including letters or diaries or other kinds of 
personal writing can be subjected to a close study with a view to 
analyse the contents into any categories that may be either pre- 
determinate or such as may evolve in the course of the examination. 
Reported dreams can be content-analysed for symbolism and ‘dra- 
matized’ processes of wish fulfilment. Similarly opinion of experts 
freely expressed in lengthy screeds may be examined for common 
points and points of disagreement. The trend of research activity 
in a domain can be similarly discovered by a study of the papers 
and research work published. 

Content analysis uses the enumeration type of statistics, wherefore 
Good and Scates call it ‘measurement through proportion’. In 
effect, pe eee it does not rise above simple 
nominal type of classification and the statistical work involved 
is of the simplest kind not going far beyond the significance of 
differences in frequencies and the tests of Null hypothesis. CGon- 
tent analysis is quantitative in a very limited way and its main 
forte is to break up the complex _ content of a type of document 


paste 


_»y bearings on other facts being studied e.g. the opinions of experts 


and read against the actual literature that is available. Content 
analysis should therefore excel in evolving useful, relevant and 
economical categories into which the content matter_can be sub- 
divided. The statistical treatment is generally of the simplest 
kind. Berelson (1956) and Scates_and Good (1954) offer a good 
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compact statement of the present status of content analysis as a 
research tool. 


(II) The Observation 


Observation is an obvious means of collecting data. This 
could be participant or unobserved. In the latter one has to 
view the subject or subjects through a one-way glass partition. 
Participation is only possible when the observers’ presence is not 
likely to modify the behaviour of the subject or subjects. For 
example, children can be observed by an adult person provided 
he is being ignored by them like a piece of furniture. If 
his presence puts them on their best behaviour the observation 
alters the situation radically. 

Observational study is useful when we are mead with facts 
of overt behaviour themselves or as expressions of inward 
mentality. Firstly observation must be directed to specific acts 
and utterances. This means that before observational Sessions 
are begun what is to be observed has been considered and decided. 
Secondly, observations must be recorded promptly preferably 
on the spot and failing that allowance should be made for the 
selective nature of the observer’s memory and its arbitrary failure. 
Thirdly observations are made on a time scheme which specifies 
the duration and. hour of the sessions. The observation must 
be made in a form which gives either a qualitative value to the 
act observed or expresses it as a tally of frequency. It has been 
found that observation is a skilled job and requires considerable 
practice if not also some aptitude. One has to be argus-eyed 
and sharp and also ambidexterous to be able to observe and record 
at the same time. Aggressive acts of nursery school children are 
an example of the data that needs to be collected by unobtrusive 
observation. It should, however, be remembered that observa- 
tion as a method of obtaining research data is a taxing and meticu- 
lous kind of activity very different from the haphazard way in 
SE iiervaticns ace made in every day life. The records of 
observations are like tallies and are not likely to involve in general 
complicated or advanced statistical treatment. The number of indi- 
viduals to be observed depends on the facts to be observed and the 
ability of the observer. In clinical work even one person can be 
observed as in psychodrama and diagnostic play activity of children. 
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(III) The Questionnaire 


This is a popular means of collecting all kinds of data in research 
and also runs the greatest risk of misuse. It is the easiest thing 
on earth to string together a large number of questions. The 
trouble starts when a helter-skelter mass of responses which defy 
systematization and orderly analysis are brought up in the net. 
The most salutary advice therefore regarding the use of the ques- 
tionnaire is that the questioner must first decide why the questions 
are being asked and secondly he or she must also know exactly 
what is to be done with the responses. The one will decide the content 
and form of the questions and the second their utilization in the 
subsequent treatment. Makers of questionnaires often tend to regard 
it as a ticket of leave comparable to the prerogative of a cross exa- 
mination. The truth of the matter is that in spirit the question- 
naire is not too remote from the objective test and needs as precise 
and collective a treatment as the latter would. 

The response to a questionnaire may be entirely open, provid- 
ing room by the space for an elaborate answer as when the ques- 
tions are of the form, ‘‘What is your reaction to the proposal for 
compulsory primary education?’ This kind of question is too 
general to allow for a sufficient number to be framed on a specific 
topic and yields responses that call for a minor sort of content 
analysis. Questions which restrict response to a specified extent 
are the standard form in a questionnaire which is normally 
divided into several sections according to the aspect or theme 
being considered. Alternatives provided may indicate degrees 
of 4 characteristic as in the question: ‘Do you consider your 
school 


“overstaffed 


vadequately staffed 
-oanderstaffed ?”’ 
In some kinds of questions a miscellaneous ‘‘any other’ category 
is given over and above the expected and foreseen alternatives é.g. 
“You consider selection to be most successful on the basis of 
Interview 
Written test 
Previous examination records 
Experience 
(Any other) * 
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The data of most questionnaires will tend to be of the enumera- 
tive kind and has to be so visualized in advance for ue 
treatment and use. 

There are certain precautions to be observed in the use of a 
questionnaire. Firstly the responses should be at least within 
each section of a uniform kind so that statistical manipulation 
is possible. Secondly the wording should be sim imple, clear and 
unsuggestive. This last adjective requires that no anti- or pro- 
sentiment should be imported into the statement, for a moralistic 
attitude is bound to disturb the free response of the subjects. A 
question like: “Do you hang around cinema houses in the 
evening?” is liable to scare off the respondents by its moralistic 
tone. The sting is in the words, ‘hang around’. This could be 
more neutrally and objectively recorded as “Do you visit cinema 
houses jn the evenings?” 

The great problem with questionnaires is to get them answered. 
Even if one makes them as easy as one can, respondents will not 
bother. One must understand the ge cholany of the average 
person’s reaction and attitude towards a questionnaire and pro- 
vide for some motivation to respond and remove all possible im- 
pediments in the way of it. The responses should truly represent 
the ideas and feelings of the respondent and should not be mecha- 
nical or feigned. Enclosing postage-bearing cover for return 
after completion is a kind of subtle moral pressure disguised as 
a courtesy which often works better than meré verbal persuasion. 

The advantages of the questionnaire are obvious; it permits 
group administration and is adaptable to any objective. The 
stages through which the questionnaire-based research passes 
are briefly: Specifications as to objective, length, form etc. ; cons- 
truction and pre-testing or try-out; follow-up in case returns are 
not immediately collected; recording of returns on master-sheet; 
tabulation and analysis; and interpretation. The sampling has 
to be taken care of to free the returns from the vitiating effects 
of bias. 


(IV) The Checklist, Schedule, Inventory 


The check-list, schedule and inventory serve the same purpose 
as a questionnaire but are syntactically different in form. The - 
schedule is sometimes regarded as synonymous with the question- 
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naire but generally the difference would consist in the form giving 
a pattern of choices for response to the respondent. As a result 
the schematic presentation is much more rigid and the response 
more controlled. The choice is frequently forced within the given 
categories. The questionnaire is looser and less restrictive and 
more like a string of independent questions. If maximally orga- 
nized it will be a schedule like Thurstone’s Temperament schedule. 
Check-lists and inventories are nearly the same things in the matter 
of form. Both give lists of statements to which response is made 
on a given set of ratings by placing a tick or check-mark, or by 
other kind of choice of offered alternatives. The words again 
suggest a uniformity of material of response which makes subse- 
quent assessment and scoring easy. Personality, adjustment, 
problems and interest inventories are familiar among instruments 
of mental measurement. It may be noted that in passing from 
a free device like the questionnaire to such straight jacketted 
material we are in effect passing from a relatively free-style sampl- 
ing of persons’ responses to a variety of questions, to concentrated 
and much more organized attempts at a quantitative assessment 
of persons’ interests, personality make-up, temperament and such 
other psychological traits and attributes. The questionnaire 
is just a means of collecting any kind of information or data; inven- 
tories are a measurement device by contrast. 


(V) The Inieview 


The clinician’s favourite method of eliciting facts is the inter- 
view. The questionnaire as a piece of printed paper is some- 
what impersonal and creates a slight mental barrier. ‘The inter- 
view aims at a good rapport and the overthrow of that barrier. 
It is by its nature slower and more time-consuming and when 
entrusted to more than one person liable to what is known as 
‘the interviewer error’. In the case-study approach it has definite 
advantages of good insight and comprehensive grasp of signifi- 
cant detail and the general background. The interview can be 
regarded as a set questionnaire personally and _ verbally 
given. In this case it is regarded as completely structured except 
that the structuring never does mean, to a good, practical inter- 
viewer, rigid adherence to the set questions and their wording. 
In the semi-structured interview the general outline to be followed 

2 el 
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is indicated but within each section the questioning is free and 
full according to the choice of the interviewer. The free inter- 
view is geared to certain large objectives and some procedural 
design in the mind of the interviewer but is otherwise 
untrammelled. 

The interview is one of the most important tools of social 
research and is dependent for its success on the personal ability 
of the interviewer more than any other method of data collection. 
There are certain types of enquiry—like questions on intimate 
personal life and sex or marital harmony—which cannot be under- 
taken satisfactorily by the questionnaire or check list methods. 
A good interviewer who is able to establish sympathetic contact 
with the subject can get responses which otherwise cannot be elici- 
ted.. This requires a great deal of natural ability, acquired skill 
and considerable sensitiveness in the interviewer; and when only 


one interviewer is available the number of cases to be examined 


is severely reduced. WAACV A EC xwEe es 

“The standardized interview’ simulates the more purposeful 
questionnaire rather closely and ‘incorporates a basic principle 
of measurement’, and is more reliable whereas the unstandardized 
interview is capable of deeper probing and is more flexible and 
life-like (Maccoby and Maccoby 1956). Much research — has 
been directed to the evaluation of the interview and its technique, 
as a tool of data collection and themes like the wording of ques- 
tions, indirect and direct approaches, attitude of the interviewer, 
rapport and recording of data have been repeatedly investigated. 
The questions should be normally free of double meaning, short 
and clear, definitive in regard to the conditions the respondent 
must assume to make a reply. They should either indicate 


all alternatives in response or none at all, and should 7° 
ened 
raw on the respondents’ personal experience in preference ™ 


to generalities. ‘They should be such as will not put up ‘ego defen- 
ces’ and encourage feigning and pretence. There are many 
subtle ways of doing this. They have been very lucidly indicated 
by the Maccobys whose article the interested will find useful. 
The recording of data from an interview is hedged with diffi- 
culties. If one writes during the course of the interview the rap- 
port is weakened and the recording interferes with the smooth 
and natural conduct of the questioning and may also encourage 


suppression of undesirable facts. Writing up from memory has 
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its perils of wayward, unconscious selection. Coding and rating 
procedures simplify and expedite the recording. Tape-recording 
of the entire interview can also be resorted to but is likely to be 
expensive of time and money in the reduction of the play-back 
to a brief and relevant record. 

Taken all in all the interview is the inevitable method of data- 
gathering for the clinical type of research and should be employed 
for all case-studies with a full consciousness of its potentialities 
and risks. 


(VI) The Test and the Scale 


Mental tests and scales are frequently employed to secure more 
exact quantitative measures for research purposes. The nature 
of these has been considered in detail in Chapters III and IV. In 
general they will yield the interval type of measures which permits 
the derivation of a wide variety of statistics of great utility. Unless 
the nature of the data prevents the employment of exact measures, 
they will normally be employed for research of the statistical type 
in the area of individual differences. Even in clinical and experl- 
mental research they hold an important place and are frequently 
used in preference to other less accurate types of sources of data. 
Their reliability is known or can be established in a sample and 
this gives them an additional advantage. The test yields a score 
or a familiar quantity like the I1.Q. and the scale gives the positions 
of given stimuli or performance on a defined and scaled dimension. 
Both quantities can be treated statistically for further manipula- 
tion to yield desired critical statistics. 


D. SMALL SAMPLE STATISTICS 


In experimental and clinical case-study methods the number 
of cases is limited severely, specially so in the latter where each 
case requires hours of study. The number of cases on which the 
investigation is based increases in the following order in different 
types of methods: Case-study (Clinical work), laboratory experi- 
ment (Psychology), ‘field’-experiment (Sociology), sample-survey 
(Statistical study) and complete survey (Census-type of study). 
In experimental work also only a limited number of persons can 
be given trials and the W may be based on trials or persons, but 
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is never very large unless accumulated over long periods of time 
which itself may introduce a factor of change. 

It has been found that statistics based on small Vs when repea- 
tedly derived do not behave the same way as when they are based 
on large Ws. The cliange occurs gradually in passing from small 
WVs to large ones. The conservative view is that Vs of 50 and more 
make a large sample. In groups of such sizes if the same statis- 
tic, say the mean, is repeatedly calculated from random samples, 
the means will show a normal distribution i.e. ‘the sampling dis- 
tribution of means’ will be normal. As against this the distribu- 
tion of means based on samples of sizes smaller than 50 (the less 
conservative limit being 30 and less) departs from normality in 
a systematic manner. The ¢ already introduced in Chapter 3 
gives the critical ratios for such distributions. 

The small sample formulae differ from the general formulae in 
using V—I1 instead of W. This N—1 term is the degree of free- 
dom, already mentioned under the y? in Chapter III. Decreasing 
WV by unity can obviously have no noticeable effect on large 
samples but when the Ns are less than 50 or 30 it begins to play 
an increasing part in the arithmetic of the formula. This means 
also that a sample of one case each cannot be used, and samples 
of W=2 leave us where we are with the derived statistics so far 
as the standard error is concerned: e.g. 


oe 


Co Sane reer fine as! Gl JE 
fe 4/2) 


means that the standard error of mean for samples of N=2 is 
equal to the standard deviation of the original distribution itself. 
The term degree of freedom means that in using the observa- 
tions of a sample for deriving statistics as valid estimates we tie 
ourselves down to that particular sample. If another sample is 
taken then to produce the same size of derived statistic we can use 
any new values but will be compelled to use a last value such as 
enables us to attain the original value of the statistic e.g. 


Sample] 3+5+2+42 12, Mean =12/4=-3 
Sample2 1+4+1+4+? 12, Mean = 3 
Sample 3 64+2+1+? = 12, Mean = 3 
Sample4 5+1+3+4? 12, Mean = 3 


! 


| 


! 
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In samples 2 to 4 if we wish to keep the mean at 3 we are free 
to change any three of the values only and the fourth thereafter 
is automatically fixed by the requirement that the mean has to be 3. 
The values we are not free to change in accepting the mean 3 
as correct for all samples are 6, 3, and 3. ‘A derived statistic is 
not sacred like the obtained observations since it is merely a func- 
tion of those values and can change in the next sample. When 
we use it as a real entity to derive further statistics we allow one 
value among the observations to lose its freedom. If we choose 
to endue a mathematically derived entity with reality such as 
possessed by an observation then some other value among the ob- 
servations has to become a shifting ghost for subsequent samples, 
the values of N—1 members determining the size of this mth case. 

Formulae of standard errors of statistics derived from smalk 
samples differ from those to be used for large samples. Fisher 
has evolved many formulae for the standard error and critical 
ratio of statistics derived from small samples. These formulae 
are given in most standard statistical texts (Guilford 1950) and 
cover such cases as the Null hypothesis test of a coefficient of corre- 

“lation, the ¢ for difference between means, correlated or other- 
wise, of difference between proportions correlated or otherwise 
and F for difference between variances. These small sample 
critical statistics imply normal distributions of the variable being 
considered in the parent population. The clinical and experi- 
mental worker who is using Vs of 50 or less (in a liberal view of 30 
or less) should use the small sample form of the formulae for critical 
statistics. 

E. THE RESEARCH REPORT 


Some indication of the formal sections of a research report has 
already been given in Chapter IJ in connection with the historical 
type of research. In a general sense those rubrics will apply to 
the report on a scientific kind of research also. Although a 
report may have its unique features, by and large, the heads 
under which the report can be adequately made are as 
described here. 

The initial pages give such details page-wise, as the title and 
the sub-title followed by the name of the researcher or the organi- 
zation conducting the research, as also the legend like “submitted 
to the —in fulfilment of the requirements for the degree 
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of —————_—”’ which ties it to an academic objective. A page 
is devoted formally to ‘Acknowledgements’ where the worker 
or team mentions with gratitude the assistance, financial or other 
sort, received during and for the research. The succeeding page 
gives a “Table of Contents’, whereafter the text of the thesis can 
be begun. 


(1) Introduction 


The introduction provides room for the comparatively free | 
handling of the preliminary facts. In general it must cover three 
issues. Firstly the significance and importance of the enquiry 
must be brought out. This would include a definition of terms. 
It is obvious that research has to be worthwhile. This does not 
mean that a strictly pragmatic philosophy is adopted in the selec- 
tion of problems. Problems in ‘pure’ fields are equally important 
from the point of view of addition to existing knowledge. Even 
revisionary and corrective research is not barred. The introduc- 
tion should indicate the nature of the new knowledge that is expec- 
ted from the research. Its value to the domain concerned will 
be obvious to any expert. Secondly the introduction must briefly 
survey the up to date research conducted on the subject and their 
results. This is a historical resume that requires considerable 
bibliographical research through back numbers of journals, pub- 
lished indices of research and the literature in libraries. This 
will also show the present status of our knowledge of the subject 
from where the proposed research goes forward to cover a new 
sector. Finally the scope and purpose of the research undertaken 
needs to be brought out. This means delimiting the area and 
defining the objectives exactly. Matter over and above these 
may go into the introduction according to need but these are an 
absolute minimum. Some persons would want to include the 
procedure to be used in the first chapter. An outline of the scope 
of the research should then be offered in general terms, since details 
are to precede the data themselves. 


(II) The Data 


This chapter gives details of how the data have been collected. 
The nature and form of the data have to be clearly described, 
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the sources and methods of their collection are to be enumerated 
and the sampling design has to be indicated and justified. Having 
justified the proposals for these a detailed account is to be given 
of the data actually collected e.g. with the help of the diary main- 
tained it is to be stated that on such several dates schools or suburbs 
A, B, C, D where visited and such measurements or observations 
were made. This is a kind of account of what the researcher 
actually did to produce his harvest of data. This is followed by 
a presentation in a summary form of the classified and condensed 
data. If the tables are too numerous they can be relegated to 
an appendix. Some people would include the general plan of 
the study on the problem to form the first part of this chapter. 
A compromise can be made by giving at the end of the introduc- 
tion the broad strategy only and providing its details with justi- 
fication at the commencement of the second chapter. 


(III) Analysis of Data 


This is a stage of statistical manipulation and finesse. There 
will be alternate methods of treatment available at every turn 
and the text of the chapter gives evidence of wide familiarity with 
available procedures and a rational and critical consideration 
of those that apply to the form of the data. The discussion must 
include a justification of the techniques preferred for subjecting 
the ordered data to further treatment. 

This preliminary discussion is followed by the presentation 
of the data in altered form after it has been subjected to analysis 
e.g. the R-matrix of correlations now appears as an F-matrix 
of factor loadings and variance table showing communality and 
specificity of factored tests; or again the matrix of criterion and 
predictors’ inter-correlations reappears as the inverse and the 
regression coefficients. There is room in this chapter for high 
class expertise which the more brilliant worker will ‘exploit to the 
full. It may be remembered that research is not carried out 
with a knowledge of generalities which are semantically within 
anybody’s reach but on the basis of a knowledge of detailed proce- 
dures which require a thorough mastery of the techniques involved. 
This analysis finally leads to a set of critical statistics which yield 
definitive results. 
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(IV) The Results and Interpretation 


The results of the critical tests made at the end of Chapter III 
are stated there in thestatistical form and need to be restated here 
in a descriptive style. The story told by the results of statistical 
tests is restated at length in technical but non-statistical terminology 
i.e. while the terminology peculiar to the field of research like I.Q., 
threshold, factor, socio-economic status, depressed personality or 
under-achievers, is used the statement ‘significant at less than | per 
cent level’ is replaced by a formal statement of the finding. In 
this manner the critical tests summarised at the end of Chapter 
III are reviewed and their meaning and importance explained. 

Having stated formally the ‘findings’ it is the business of the 
researcher to make them meaningful in the context of the setting 
of his theme. It is not enough to say what the statistical tests 
reveal. It is necessary to go back to the original objectives of 
the research and see how the answers received enable us to achieve 
our objectives. This is the task of interpretation: relating the 
finding to the rest of the field of knowledge. Specially in educa- 
tion a relation to existing practices or policies is to be sought and 
subjected to criticism. In fine the interpretation of the results of 
research means showing what exactly the new knowledge implies 
and what its significance to the experts and specialists is. ‘There 
is a degree of freedom in the interpretation of findings which no 
other stage of the scientific type of investigation possesses. This free- 
dom the report writer must use with caution so that what he de- 
duces does not furnish ground for polemics. Above all the results 
of scientific research must stay beyond the pale of controversy 
and mere verbal argument. Probably the story about Cordell 
Hull, one time Secretary of State of the U.S.A., is a fine illustration 
of this spirit of cautiousness. Hull was always cautious in speech 
striving for scientific accuracy. One day on a train a friend poin- 
ted to a flock of sheep grazing in a field. ‘‘Look, these sheeps have 
just been sheared’’, he said. Hull studied the flock. ‘‘Sheared 
on this side any way’, he admitted. 


(V) Conclusion and Summary 


The conclusion pin-points the pragmatic aspect of the results 
and the interpretation. It applies the research findings to the exis- 
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ting state of affairs implying changes in outlook, understanding, 
policy, practices and life. The conclusion in short shows the net 
result of the research and its impact on existing knowledge. The 
summary is a condensation in succinct, simple, straightforward 
and non-technical language of what has been done, found and 
established reasonably beyond dispute. All the verbiage 
and argot is cut out and the reader is able at a glance 
to understand what has been achieved by the particular 
research. 

Appendices are used to furnish such details or matter of specialist 
interest (like mathematical proof) as are not conveniently or rele- 
vantly included in the text. They may be several according 
to need. Sometimes the raw material form of data, like tabulated 
values, is also included in such appendices although should this 
be considerable it would be advisable to supply it in a separate 
volume. Discrimination between what belongs to the text 
and what should be relegated to an appendix requires consi- 
derable acumen. Similarly it is difficult to decide when a foot- 
note is to be inserted, as against a reference to a book, in the body 
of the text. Normally any topical side-issue which needs a little 
explanation or comment is amplified in a footnote. These should 
be a minimum for smooth readability. Too many digressions 
are likely to impede the progress of the main argument which 
must hold the centre of interest. In the erudite kind 
of research based on books footnotes are likely to be 
copious. 

The appendices are followed by the References or the Bublio- 
graphy. ‘This is an alphabetically arranged list of the references 
quoted throughout the text. The referencing is done in one of 
the following ways: 


Statement X (19) 
Statement X (Spearman 1926) 


In the former case against the serial number 19 the name of the 
author, year and article (with name and volume of journal) or 
book are given in a set form; in the latter case which is to be pre- 
ferred as more mature and elegant the same details are given 
against the name Spearman under the alphabetical list of referen- 
ces. A reference from 'Thomson’s Factorial Analysis of Human 
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Ability, 1951 edition reads like this: Swineford, Frances, 1941, 
“‘Comparisons of the Multiple-Factor and Bifactor Methods of 
Analysis’, Pmka., 6, 375-82. 


This means that the name of the author is Frances Swineford, 
the year 1941, the title of the paper published in Psychometrika 
abbreviated to Pmka., volume 6, pages 375-82. This is an accept- 
able style of referencing except that presently the titles of papers 
and articles of journals are given without capitals barring the first 
word. ‘Thus a reference in Gulliksen reads: Wissler, Clark, (1901) 
the correlation of mental and physical tests. Psychol Monogr., 3, No. 
16, 1—62. In this style, only the titles of books carry capital 
letters. 


The references are followed by an Index which most Indian 
publications omit and which is a very useful and necessary appen- 
age to have although it may not figure as importantly in a re- 
search report as in a book. The index gives an alphabetical 
break up of the subject matter and all significant topics 
and themes subsuming their sub-divisions indicating the 
pages where they are treated. An idea of this can be formed 
from going to the end of any sound book on education or 


psychology. 


Finally a word regarding the style of exposition. The scientific 
worker is not out to give his style an exercise. He uses language 
as a transparent medium of communication, and by preference 
uses plain rather than evocative words. Much of his terminology 
is already loaded with technical and specific content. Repetition 
and verbiage, and a desire to practice graces of style are not sui- 
table for his report, which must be clear and brief. He is prolix 
only in displaying his facts, analysis and results. The explanatory 
part of the text is short. It is also advisable to eliminate from 
the writing the first person singular. Instead of saying “I then 
tested the sample for flaws’’, it is better to say “The sample was 
then tested for flaws”. Even in the ‘Acknowledgements’ the 
more conservative writer would say “the writer wishes to express 
his gratitude to ’ and thus avoid the personal “‘I’’. This 
means that the reporting of this kind of research is cold and matter 
of fact and the appeal is to the evidence put forward and the in- 
ferences it unfolds under treatment. 
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APPENDIX 


A NOTE ON “ACTION RESEARCH’, AND DECISION 
AND INFORMATION THEORIES 


*““AcTION REsEARCH’’ is a term which occurs frequently in the work of the more 
pragmatically oriented sociologist and social worker and needs some elaboration. 
Persons engaged in large-scale reformist activity are inclined to use the term 
‘somewhat loosely to imply a scientific evaluation of the outcomes of deliberately 
planned changes in policy and procedures applied to a field of social activity. 
As the words suggest, the idea is to assess the results of continuing action, and 
to take these into account in planning the next step or course. This does away 
with the need to set up a crucial experiment which would yield, under controlled 
conditions, a clear result, on the basis of which the effect of a “‘treatment’’ is 
judged. “‘Action research’? thus removes even the limited restrictions of the 
“field experiment’’. In the social field persons cannot be treated as expendable, 
experimental guinea pigs; they are important in themselves and cannot be made 
to run any risks. For example, it is not possible to subject some groups of 
children in schools to “‘treatments’’ or deliberately imposed conditions which 
are expected, in the hypothesis, to do them no good. And yet change for the 
better must be considered. ‘‘Action research’’ is thought of as an approach 
in such a situation. In brief it is an exploratory first step in a direction which 
is expected, on some hypothesis, to yield good results. Under this scher good results. Under this scheme a 
eform may be proposed on rational grounds and a well-considered change in 
some practice introduced. The effect of this is watched over a specified time 
interval, Here we are interested in the rate of progress in a predetermined 
direction. Progress is recorded in any agreed and acceptable form periodically. 
This requires some criterion measures in terms of which progress can be noted. 
The rates of progress become comparable over long intervals of time for successive 
groups like the classes in schools proceeding from one standard to another or 
age-groups growing older year by year. This plan is comparable to the 
‘treatments by subjects’ design of experimental set-up except that there is no 
systemetic sampling for variance. It may be noted that if there are 
contemporary and parallel, comparable groups with different ‘‘treatments’? then 
we are conducting an ordinary ‘‘field experiment’. Action research is thus 
the simplest design of research, although in actual implementation it may call 
for considerable resourcefulness. Its main difficulty consists in evolving suitable 
and objective criteria of progress, as on account of its peculiar setting the tempta- 
tion to read commendable progress in subjective evaluations is very great, and 
progress itself too complex for clear appraisal. 7 
Normally elementary statistical indices, considered at some length in this book, 
will solve the problem of treatment of data arising out of action research. But 
Decision theory in statistics is probably the kind of manipulation most suited 
to a teleological and time-oriented approach like this. In the text discussions 
of sequential sampling, and OC and powercurves touched upon this theory which 
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has a formidable rationale of its own independently of the classical statistics 
which it redefines as “‘the science of decision making under uncertainty’’. Decision 
theory deals with questions of available choice of actions, the possible “‘states of 
nature’? (or facts), a value system to consider consequences of certain actions 
and the selection of a ‘‘strategy’’ or decision. Obviously Decision theory is a 
very work-a-day affair and has none of the abstract perfectionism of the para- 
metric statistics; and yet it is a much more realistic and practical approach to 
the problems which action research must present to the worker. Bayes rule 
plays a central part in the development of the probability theory for decision. 
If the decision regarding change we take now is going to have a chain reaction 
—one effect leading along the time line to another—then we are concerned with 
what are known as stochastic processes which provide for sequences of events 
governed by laws of probability. 

Information theory is another recent development in statistics which utilizes 
basically the theory of stochastic processes. Information theory is also concerned 
with the central idea of uncertainty and actually defines informations as “‘that 
which removes or reduces uncertainty’’. Information or uncertainty is measur- . 
able in terms of “‘bits’’ (short of, binary digits) which refer to ‘yes’ and ‘no’ responses 
to the smallest number of questions which will enable one to identify a particular 
thing. Each response is a ‘‘bit’? of information. If out of m total number of 
things one particular, say P, is to be identified then 


m = 2H, 


when the questions admitting ‘yes’—‘no’ answers are so asked that the alternatiVes 
every time split in half. H is then a measure of uncertainty. This theory develops 
also the concept of redundance as conformity to a pattern or law. The wholly 
expected is wholly redundant from the point of view of information. The basic 
facts of relatedness of orders of phenomena common to the concepts of correlation 
and analysis of variance are restated in terms of shared information in this theory. 
Man’s mind and nervous system are being conceived of as models of information 
transmission systems utilizing many channels, and many ingenious applications 
of this theory have been made to psychology in the last few decades, e.g. the 
quantification of the sequences in learning processes or of the patterning of 
gestalten. For a complete statement they require whole chapters to themselves. 
Both Decision processes and Information theory are mathematically somewhat 
advanced and are of recent growth. The ordinary “‘action research”’ enthusiast 
is likely to be at the farthest remove from the statistical acumen and finesse 
demanded by these procedures. Research in his field could however benefit 
immensely from an acquaintance with both these branches of mathematical 
statistics. It is not possible to deal with these at any length in an appendix. To 
those who are interested in them H. Chernoff and L. E. Moses’ Elementary Decision 
Theory (Wiley), and F. Attneave’s Applications of Information Theory to Psychology 
(Holt, Rinehart, Winston) will prove encouraging and stimulating as introductory 
reading. 
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